Pinderkent

Pain and glory from the trenches of the IT world.

Programming languages should not try to guess the programmer's intentions.

Posted on Sunday, May 24, 2009 at 12:39 AM.

A common trait among some of the poorer-quality programming languages, namely PHP and JavaScript, is their use of weak typing. While some developers are convinced that it's acceptable, it's generally a bad idea to have a programming language essentially guess at what the programmer means.

Recently, I saw an article describing some problems within a PHP script caused by automatic conversion. Frankly, these kinds of issues should just not exist. Strong, static typing is clearly a better approach. Although it puts slightly more of a burden on the programmer, the act of manually specifying type conversions leads to higher-quality software, especially if any errors are caught at compile-time, rather than run-time.

JavaScript is another language that employs weak, dynamic typing. I recently saw another article that gives some good examples (under the "2. Plus operator overloading" section) of how this behavior may result in unexpected results, especially for novice developers. But even seasoned professionals still make mistakes, and such conversions should at least be flagged with warnings, if not outright disallowed.

Even though we often deal with fuzzy and incomplete specifications when developing software, we shouldn't bring such uncertainty and guesswork to our communication with the computer itself. We should specify exactly what we mean, even if it does take slightly more typing. Then again, when using languages like Haskell and OCaml, we can clearly see how strong, static typing and type inference can be implemented without overly burdening programmers. Any type conversions that must be manually specified help to force the programmers to think about what they're doing, which in some cases may be quite wrong, especially if a type conversion is necessary.

For the sake of trying to achieve even a moderately reasonable level of quality in our software, especially when programming for a hostile environment like the Internet, we shouldn't resort to languages like JavaScript and PHP that allow for type-related errors to occur so easily. It's even worse when they try to make automatic conversions that result in unexpected behavior. That's just plain unacceptable.

Permalink: http://pinderkent.phumblog.com/post/2009/05/programming_languages_should_not_try_to_guess_the_programmers_intentions
Share:

Keeping commented-out code is justifiable.

Posted on Sunday, May 24, 2009 at 12:10 AM.

Today I read an article that suggested the following:

Commented out code are not comments - Use version control, don't track code changes by commenting them out. Commented out code is schizophrenic code.

To some extent, this is true. It is poor practice to try and maintain extensive source code history through commented-out code. As the author of that article correctly points out, there are numerous software version control systems out there, and most developers are familiar with at least one of them.

But we shouldn't go so far as to say that there's no place for commented-out code. Contrary to what the author of that article suggests, a few lines of commented code can say more to a developer than paragraphs of prose comments. One case where we want to retain such code is when it has a serious flaw that we (or more realistically, a maintenance programmer) don't want to accidentally repeat in the future. By leaving the flawed code there, albeit commented-out and with a quick note describing why the code should not be used, we can leave an effective reminder of the problem.

One other situation I've seen where we legitimately have code in a comment involved a Perl script that was used to generate code for a C array containing certain values. Instead of putting the Perl script in a separate file, where its purpose may not be fully understood, it was stored within the C source file just above the array code that it had generated.

Like goto statements, we shouldn't think that code within in comments shouldn't be used at all just because it can be abused in some cases. It is a technique that has appropriate uses.

Permalink: http://pinderkent.phumblog.com/post/2009/05/keeping_commentedout_code_is_justifiable
Share:

Losing developer time to performance problems hidden by high-level languages.

Posted on Saturday, May 23, 2009 at 11:48 PM.

One of the main purposes of high-level programming languages is to save developer time by abstracting away the onerous and tedious aspects of the underlying hardware. In general, most high-level languages tend to do a good job at this. Unfortunately, we see these same high-level languages wasting significant amounts of developer time. Many times, this is due to performance problems. What becomes problematic, however, is that in order to properly diagnose and fix many of these performance problems, the developers involved need to obtain a high degree of understanding about the implementation of the high-level language that's involved.

A good example of this is a performance issue described recently with IronPython, an implementation of Python for Microsoft's .NET platform. In short, a very innocuous line of code was apparently responsible for the poor performance.

This incident highlights several main problems. The first is that high-level code can lead to some very unexpected interactions within the high-level language's implementation. This can obviously cause problems by misleading the developer or developers dealing with the performance problems. What appears on the surface to be a simple and likely very fast operation ends up being the culprit. A lot of developer time can be spent looking in the wrong places.

The second concern is that tracking down the problem requires in-depth knowledge about the high-level language's implementation. To some extent, we use such high-level languages in the first place to avoid needing to acquire such lower-level knowledge. We want to focus on the application we're writing, not on dealing with issues pertaining to the platform we're building upon. Time spent learning about the high-level language's implementation is time not spent on developing the application at hand.

This particular situation seems to have had a "happy" ending. The victim of the poor performance got a rapid response from somebody who did have inside knowledge about IronPython's implementation. Unfortunately, this isn't always the case. I've seen far too many times when developers have spun their wheels trying to track down obscure performance problems of that type. And it isn't a problem associated just with programming languages like Python, Ruby, or Perl, either. We often see it happen with SQL. A minor change to a query can result in a huge performance gain or loss.

As we start using high-level programming language implementations like IronPython, Scala, Clojure and JRuby, which are themselves often implemented in high-level programming languages like Java or C#, which in turn run on some sort of a virtual machine, we'll run into these sorts of problems more and more frequently. Each additional layer of software abstraction that we add in makes the situation more and more difficult. Soon we may need to look in two or three very different layers of software, assuming we even have source access, to track down performance issues. This could very well lead to a serious waste of developer time and effort.

Permalink: http://pinderkent.phumblog.com/post/2009/05/losing_developer_time_to_performance_problems_hidden_by_highlevel_languages
Share:

Please stop asking me to take your Web site improvement surveys.

Posted on Monday, May 18, 2009 at 2:27 PM.

One thing I've noticed getting more and more common is the use of survey hovers on the Web sites of a variety of companies. A typical scenario involves me going to their site to read up on one or more of their products, only to encounter a hover popup requesting me to take some survey, usually about the Web site itself. A good example of this is on Intel's Web site:

Survey hover screenshot.

These survey hovers are too intrusive, especially on commercial sites. When I'm focused on finding the best product to buy, I want to be seeing product specifications and prices. I don't want to be distracted with survey participation requests.

Now, I could always take the survey, and hope there's some area where I can add my own comments and explain my annoyance with the survey popups. But somehow I think that my participation in the survey would be misconstrued to mean that the survey hovers are in fact working, and getting people to take the survey, ignoring the fact that the user suggestion is to drop the survey hovers. So I'll write about it here instead, and hope that some marketing folks see this posting.

I've got nothing against the surveys themselves, and can understand the need for customer feedback. I just really dislike the in-your-face approach of these hovers. From my perspective, they do more harm than whatever good they might bring. When on a commercial Web site, these hovers distract me (and probably others, as well) from focusing on the company's products, which can negatively affect my purchase of said products.

Permalink: http://pinderkent.phumblog.com/post/2009/05/please_stop_asking_me_to_take_your_web_site_improvement_surveys
Share:

An orgy of misinformation regarding programming language performance.

Posted on Sunday, May 17, 2009 at 7:42 PM.

Last week, a colleague forwarded me the link to one of the most blatantly incorrect computing articles he had ever seen. It's entitled C# vs C/C++ Performance, and after reading it, I must agree with my colleague. It isn't often that we get to see an article so full of misinformation.

We see the first major mistakes within the "Point 1" discussion. The author clearly has some serious misunderstandings about hyper-threading and instruction sets. One of the gems we see is: ... a C++ program will not be able to take the advantages of the "Hyper Threading" instruction set of the Pentium 4 HT processor.

Of course, no such instruction set exists. Hyper-threading, as implemented by Intel up to this point, is transparent to userland applications.

Immediately after that, we read: Of course HT is outdated now....

That is, of course, absolutely false. Intel's recent Core i7 processor makes use of hyper-threading, with each of its four cores supporting two simultaneous threads. It's not an "outdated" technique.

It gets better as we read on: It will also not be able to take advantages of the Core 2 duo or Core 2 Quad's "true multi-threaded" instruction set as the compiler generated native code does not even know about these instruction sets.

Again, we see more ignorance regarding instruction sets and hyper-threading. While newer CPUs do often include new instructions, the supposed "true multi-threaded" instruction set that the author of that article writes about is bunk.

The next misleading claims we see are as follows: In the earlier days, not much changes were introduced to the instruction set with every processor release. The advancement in the processor was only in the speed and very few additional instruction sets with every release. Intel or AMD normally expects game developers to use these additional instruction sets.

We can see how wrong this is by looking at Wikipedia's x86 Instruction Listings page. It shows the original 8086/8088 instruction set, along with the instructions added with each processor generation. Based on that information, we can see that older processors such as the 80386 and the Pentium Pro each added quite a few new instructions. And the claim that new instructions are typically added for "game developers" is laughable. Games are just a small subset of the multimedia applications which benefit the most from the newer instruction sets. And that's not to mention the scientific and engineering applications which benefit significantly, as well.

Next we move on to the "Point 2" discussion. It almost immediately starts off with a five-line snippet of completely unrealistic code. It's not even a remotely valid microbenchmark, which themselves are often bad enough when they actually can compile and actually can be executed. Yet from this code, which consists of an undefined function that performs a "really time consuming operation" being called 100,000,000 times, the author of that article comes to the conclusion that "C++ is faster by a order of magnitude." Huh?

The next sentence reads: Nearly all the threads I've seen that claims C++ is faster writes a small application like this a prove that C++ is atleast n times faster than an equivalent c++ program and yes it's true.

So we find out that the claims of the author's article aren't even based on personal experience, let alone more rigorous approaches. They're based on what was read in some message board or mailing list. And beyond that, we can see statements that don't make even the slightest bit of sense as written, such as "C++ is atleast n times faster than an equivalent c++ program." The second "c++" should apparently read "C#".

The "Point 3" discussion focuses on memory management. While the author is somewhat correct in pointing out that memory management is more involved when using C++, there is no mention of Boost's smart pointers, the Boehm-Demers-Weiser garbage collector, Valgrind and the various other technologies that greatly help to prevent or track down memory leaks in C and C++ applications. I've seen first-hand how these technologies can be used to develop long-running systems in C++ that contain millions of lines of source code, yet don't suffer from obvious or significant memory leaks.

Further along, we get to read: Everyone knows that page fault is one of the most time-consuming operation as it requires a hard disk access. One page fault and you are dead.

While excessive page faults are typically bad for performance, they're not the evil that the author of that article portrays them to be. A single page fault won't typically harm performance as badly as the author suggests. And with most modern operating systems, we often see demand paging used. In such a scenario, we don't load a page from disk until it's actually referenced, which can lead to improved application startup times and reduced memory usage. So some page faults are to be expected.

The next misunderstanding is: A lot of classical applications including Google Picasa suffers from memory management problems. After about two or three days, you can notice that these applications become slower necessitating a Windows Restart. This problem is completely alleviated in C#. the Framework comes with a broom behind you and sweeps your drop during the course of the execution and as a result your working set never grows (unless you really use it) which means lesser page faults.

While many desktop applications today do leak memory, it doesn't make sense for the author of that article to suggest that we need to perform a full reboot of the operating system. Killing the application process should, under most modern operating systems (including most versions of Windows still in use), be sufficient to free whatever memory it may have been using. Furthermore, it's incorrect to suggest that such problems will be "completely alleviated" while using a language like C#. It's quite possible for an application to have code that maintains references to objects that are no longer needed, thus preventing them from being garbage collected. Carelessness can cause problems regardless of which programming language is being used.

The author of that article comes to the conclusion that the best thing to do is take a hybrid approach; write most of the application in C#, and have it call out to performance-critical code written in C++. While this is an option, a better approach is to first profile your code to see where and why it is actually slow. Don't just assume it's the language. Often times, we see poor algorithms being used, or unnecessary computation being performed. Based on my years of experience, I'd say that fixing such issues will typically give a much greater performance boost than changing programming languages.

Every author, myself included, will no doubt make minor mistakes here and there while writing. They're expected, and forgiven. However, the article that my colleague linked me to was incorrect from top to bottom. What is perhaps the most disturbing thing about that article is that some people may very well believe what it is saying to be true. Perhaps articles like that one are why so much software is so poorly written. To those who don't know better, such articles sound legitimate and sensible. But after even the slightest bit of analysis, we see such articles fall apart almost completely. Unfortunately, there are a lot of people out there who can't or won't perform such analysis.

Permalink: http://pinderkent.phumblog.com/post/2009/05/an_orgy_of_misinformation_regarding_programming_language_performance
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives