Pinderkent

Pain and glory from the trenches of the IT world.

An orgy of misinformation regarding programming language performance.

Posted on Sunday, May 17, 2009 at 7:42 PM.

Last week, a colleague forwarded me the link to one of the most blatantly incorrect computing articles he had ever seen. It's entitled C# vs C/C++ Performance, and after reading it, I must agree with my colleague. It isn't often that we get to see an article so full of misinformation.

We see the first major mistakes within the "Point 1" discussion. The author clearly has some serious misunderstandings about hyper-threading and instruction sets. One of the gems we see is: ... a C++ program will not be able to take the advantages of the "Hyper Threading" instruction set of the Pentium 4 HT processor.

Of course, no such instruction set exists. Hyper-threading, as implemented by Intel up to this point, is transparent to userland applications.

Immediately after that, we read: Of course HT is outdated now....

That is, of course, absolutely false. Intel's recent Core i7 processor makes use of hyper-threading, with each of its four cores supporting two simultaneous threads. It's not an "outdated" technique.

It gets better as we read on: It will also not be able to take advantages of the Core 2 duo or Core 2 Quad's "true multi-threaded" instruction set as the compiler generated native code does not even know about these instruction sets.

Again, we see more ignorance regarding instruction sets and hyper-threading. While newer CPUs do often include new instructions, the supposed "true multi-threaded" instruction set that the author of that article writes about is bunk.

The next misleading claims we see are as follows: In the earlier days, not much changes were introduced to the instruction set with every processor release. The advancement in the processor was only in the speed and very few additional instruction sets with every release. Intel or AMD normally expects game developers to use these additional instruction sets.

We can see how wrong this is by looking at Wikipedia's x86 Instruction Listings page. It shows the original 8086/8088 instruction set, along with the instructions added with each processor generation. Based on that information, we can see that older processors such as the 80386 and the Pentium Pro each added quite a few new instructions. And the claim that new instructions are typically added for "game developers" is laughable. Games are just a small subset of the multimedia applications which benefit the most from the newer instruction sets. And that's not to mention the scientific and engineering applications which benefit significantly, as well.

Next we move on to the "Point 2" discussion. It almost immediately starts off with a five-line snippet of completely unrealistic code. It's not even a remotely valid microbenchmark, which themselves are often bad enough when they actually can compile and actually can be executed. Yet from this code, which consists of an undefined function that performs a "really time consuming operation" being called 100,000,000 times, the author of that article comes to the conclusion that "C++ is faster by a order of magnitude." Huh?

The next sentence reads: Nearly all the threads I've seen that claims C++ is faster writes a small application like this a prove that C++ is atleast n times faster than an equivalent c++ program and yes it's true.

So we find out that the claims of the author's article aren't even based on personal experience, let alone more rigorous approaches. They're based on what was read in some message board or mailing list. And beyond that, we can see statements that don't make even the slightest bit of sense as written, such as "C++ is atleast n times faster than an equivalent c++ program." The second "c++" should apparently read "C#".

The "Point 3" discussion focuses on memory management. While the author is somewhat correct in pointing out that memory management is more involved when using C++, there is no mention of Boost's smart pointers, the Boehm-Demers-Weiser garbage collector, Valgrind and the various other technologies that greatly help to prevent or track down memory leaks in C and C++ applications. I've seen first-hand how these technologies can be used to develop long-running systems in C++ that contain millions of lines of source code, yet don't suffer from obvious or significant memory leaks.

Further along, we get to read: Everyone knows that page fault is one of the most time-consuming operation as it requires a hard disk access. One page fault and you are dead.

While excessive page faults are typically bad for performance, they're not the evil that the author of that article portrays them to be. A single page fault won't typically harm performance as badly as the author suggests. And with most modern operating systems, we often see demand paging used. In such a scenario, we don't load a page from disk until it's actually referenced, which can lead to improved application startup times and reduced memory usage. So some page faults are to be expected.

The next misunderstanding is: A lot of classical applications including Google Picasa suffers from memory management problems. After about two or three days, you can notice that these applications become slower necessitating a Windows Restart. This problem is completely alleviated in C#. the Framework comes with a broom behind you and sweeps your drop during the course of the execution and as a result your working set never grows (unless you really use it) which means lesser page faults.

While many desktop applications today do leak memory, it doesn't make sense for the author of that article to suggest that we need to perform a full reboot of the operating system. Killing the application process should, under most modern operating systems (including most versions of Windows still in use), be sufficient to free whatever memory it may have been using. Furthermore, it's incorrect to suggest that such problems will be "completely alleviated" while using a language like C#. It's quite possible for an application to have code that maintains references to objects that are no longer needed, thus preventing them from being garbage collected. Carelessness can cause problems regardless of which programming language is being used.

The author of that article comes to the conclusion that the best thing to do is take a hybrid approach; write most of the application in C#, and have it call out to performance-critical code written in C++. While this is an option, a better approach is to first profile your code to see where and why it is actually slow. Don't just assume it's the language. Often times, we see poor algorithms being used, or unnecessary computation being performed. Based on my years of experience, I'd say that fixing such issues will typically give a much greater performance boost than changing programming languages.

Every author, myself included, will no doubt make minor mistakes here and there while writing. They're expected, and forgiven. However, the article that my colleague linked me to was incorrect from top to bottom. What is perhaps the most disturbing thing about that article is that some people may very well believe what it is saying to be true. Perhaps articles like that one are why so much software is so poorly written. To those who don't know better, such articles sound legitimate and sensible. But after even the slightest bit of analysis, we see such articles fall apart almost completely. Unfortunately, there are a lot of people out there who can't or won't perform such analysis.

Permalink: http://pinderkent.phumblog.com/post/2009/05/an_orgy_of_misinformation_regarding_programming_language_performance
Share:

C and C++ play a very crucial role in most Web application systems.

Posted on Friday, May 15, 2009 at 2:21 AM.

Today, over at Hacker News, I saw a topic asking why C++ isn't commonly used for Web applications. The question itself is quite valid; we typically don't see Web applications themselves developed in C++. But that doesn't mean that C and C++ don't have an integral role within a Web-based system. Their use isn't as visible as that of Ruby, PHP, Python or Perl, but it's important nevertheless.

Admittedly, the back-end of many Web applications really isn't all that complex. In many cases, it's basically just a friendlier interface to a datastore of some sort, maybe offering some caching, and usually some basic data manipulation. And although C++ libraries like the STL and Boost allow for such tasks to be performed with relative ease, there's essentially little benefit in using C++. Scripting languages are often sufficient.

That said, C and C++ still do have a huge role in most Web application stacks today. We shouldn't forget that most of the popular server operating systems, Web servers and database systems today, as well as the most widely used implementations of most scripting languages, are typically written in C or C++. This is quite apparent within the popular open source Web stacks.

At the very core, we have C playing an integral role in virtually all of the popular server operating systems today, especially UNIX-like systems like Linux, FreeBSD, and Solaris. On top of that, we have popular Web servers like Apache, nginx, and lighttpd that are all written in C. And for database systems, PostgreSQL and SQLite are written in C, while MySQL uses both C and C++.

C and C++ are also critical to the programming languages used to implement many Web applications. The most widely used implementations of Python, Ruby, Perl and PHP all use C. Even Sun's HotSpot Java virtual machine makes very extensive use of C and C++.

So when we take a more holistic view of Web applications, we see that C and C++ prove to be very widely used. They're used for some of the most critical aspects of Web-based systems, where performance and reliability truly matter. Even if they get more of the attention, languages like PHP, Python, Ruby, Java and Perl end up being little more than glue languages, tying together the software implemented in C or C++. It becomes easy to forget their importance, but this may just be because the software developed using them has matured to the point where they provides such stable interfaces that we can totally ignore their implementation language. Nevertheless, C and C++ are very critical to the vast, vast majority of Web applications that exist today.

Permalink: http://pinderkent.phumblog.com/post/2009/05/c_and_c_play_a_very_crucial_role_in_most_web_application_systems
Share:

Putting Stack Overflow's hardware usage in perspective.

Posted on Saturday, May 09, 2009 at 2:34 AM.

Anand Iyer recently wrote an article that transcribes a portion of a video in which Joel Spolsky is discussing the hardware and software that is backing the very useful and increasingly-popular Stack Overflow Web site. It's mentioned that there is one Web server and one database server, both running on "eight core Xeons" and serving "16 million" pages a month. At first, that sounds impressive. But thinking about it more, I'm not so sure it really is.

First, we should convert that value of 16 million pageviews into something we can comprehend better. Assuming a month of just 30 days, a quick bit of math shows that to be 2,592,000 seconds. Sixteen million pages over that number of seconds ends up being a mere 6.2 pages per second. Now, that's probably not a totally accurate picture. There are no doubt times when the traffic is much higher than that, and other times when it's lower. But even if their overall pageview traffic were to triple or quadruple, we're still not seeing huge numbers of simultaneous page requests.

I don't think people today truly realize the power of today's hardware. Even the low-cost, consumer-grade PCs you can buy for a few hundred dollars are significantly more powerful than the servers of just a few years ago. So I don't think we should get too excited about Stack Overflow being able to serve 6 or 7 pages per second, if not many times that during periods of heavy load, over what's essentially 16 very powerful CPUs.

Thinking back to some of the database-backed intranet Web sites I've worked on in the past, we were able to reasonably handle sustained traffic of 30+ pages per second at times on far inferior hardware. This was even when we still used CGI scripts written in Perl, which have not only the overhead of starting up the interpreter process with each request, but also the overhead of the program interpretation itself.

I recall one job in particular because of how rushed it was. The company had several call centers located throughout the world, and was moving towards a custom Web-based solution for the call center operators to use. Expecting up to 30 simultaneous users per second at peak hours, they had placed an order for some significantly powerful hardware at that time. The order was delayed for some reason, but management wanted the site to go live. So the decision was made to temporarily use some older, unused Sun workstations as servers. I recall spending a night getting two workstations set up as Web servers, and one as a database server, so the system could go live the following morning. It went live, and everyone was very surprised to find that even under higher than expected load, running on older Sun workstation hardware and using Perl CGI scripts, the responsiveness of the Web site was quite acceptable.

Now, I don't expect that to be the case all of the time. Like Joel points out in his talk, there are many sites even today that use significantly more hardware than they probably should. But with some sensible caching policies and a small degree of care while programming, it really wasn't overly difficult to get high-traffic sites running on a small amount of lower-end hardware. Even in Stack Overflow's case, it sounds like they could get by very easily with a small fraction of their current infrastructure. However, it is good to have room to grow, as the traffic to that site likely will.

Computer hardware today is extremely powerful. For most Web sites, even those getting millions upon millions of pageviews per month, scalability just shouldn't be an issue. If it is, it's likely that there have been some pretty significant programming mistakes made when developing the software powering the Web site. And with low-end servers today typically coming with eight or more logical processors and many gigabytes of memory, servicing hundreds of requests per second from a single system should be considered routine.

Permalink: http://pinderkent.phumblog.com/post/2009/05/putting_stack_overflows_hardware_usage_in_perspective
Share:

A lack of good documentation is one of the main problems with Firebird.

Posted on Wednesday, April 29, 2009 at 3:34 AM.

The recent Why so few developers are using Firebird SQL? article has generated a fair amount of discussion, including some at Reddit. For those who might not be aware, Firebird is an open source RDBMS based on the InterBase 6.0 source code that Borland released just under a decade ago. Since then, it has been under continuous development, but really hasn't caught on like other open source databases such as PostgreSQL, MySQL and SQLite have.

About a year ago I was working with a company who was considering the use of Firebird for a new in-house application they were developing. I wasn't directly involved with this particular project, but did talk to some of the developers working on it. Having used InterBase years ago while working on some Delphi-based software, and having heard of Firebird, I was interested in seeing what they had to say about it. While they didn't have many technical complaints, there were a few factors that resulted in them opting to use PostgreSQL instead.

Technically, InterBase and Firebird aren't bad database systems by any means. They do offer exactly what's needed and what's expected by many users. My experience with InterBase years back was that its performance and reliability were quite suitable, and I don't have any reason to think the situation would be any different now. Personally, I would entrust Firebird with valuable data and availability over MySQL. The project I mentioned earlier basically had the same opinion, as I recall. Their complaints weren't of a technical nature.

One significant complaint they did have was with Firebird's documentation. When they looked at it, it was basically the InterBase 6.0 manuals with separate documentation outlining the changes and additions made by the Firebird developers. Checking the Firebird documentation page now, about a year later, it seems that it is still a combination of the InterBase 6.0 manuals and the Firebird 2.0 Language Reference Update document.

Not all of the developers working on the project had used InterBase 6.0, and facing relatively tight deadlines, they didn't expect to be able to get everyone up to speed fast enough if they had to become familiar with InterBase 6.0 first, and then "patch" that knowledge with the Firebird updates. One major benefit of PostgreSQL is that it offers very comprehensive and accessible documentation online. The vast majority of questions and issues can be resolved by referring to the appropriate section of their documentation.

The developers and managers of the project I mentioned earlier also felt more comfortable with the community and development processes around PostgreSQL. I'm not sure what sort of research they did to come to this conclusion, but I recall them saying that they thought the PostgreSQL developer and user community was more "stable". My interest in their findings was more technically-oriented, so I didn't follow up much with respect to this.

Personally, I'd like to see greater adoption of Firebird. I think it has technical merit, and given its heritage it should be production-ready for many users. Streamlining the documentation might help encourage its adoption. It won't be an easy task by any means, but if the Firebird project could produce and then maintain some documentation on par with that which the PostgreSQL project has put together, the burden on new users would be eased greatly. We may then see more people willing to at least give it a try.

Permalink: http://pinderkent.phumblog.com/post/2009/04/a_lack_of_good_documentation_is_one_of_the_main_problems_with_firebird
Share:

Web applications are a poor approach for developing high-quality, cross-platform applications.

Posted on Wednesday, April 29, 2009 at 1:13 AM.

I just finished reading Marcus Cavanaugh's recent The Cross-Platform Myth article. The first two-thirds or so of it are quite correct. He points out that when an application is developed for two or more desktop environments that differ in some pretty fundamental ways, like Windows and Mac OS X, the result usually isn't too great. The app likely won't fit in well on one or more of the platforms, which can lead to usability problems, and may even result in users not adopting the software.

At the very end of the article, however, he writes the following:

The only acceptable cross-platform UI toolkit lives in your web browser. If you want your application to work on both Windows and OS X, create a web application. In the browser, you can freely design a custom user interface that won't seem out of place on any operating system. Users understand that web sites operate under different rules.

I find this reasoning quite absurd. First of all, Web browsers are some of the worst-conforming desktop applications around. Early in the article, he even mentions Mozilla Firefox as an example of a cross-platform application that just doesn't fit in anywhere. But Firefox isn't the only browser guilty of this. The Windows port of Apple's Safari Web browser clearly doesn't behave like a typical Windows app, either. And to some extent, the same even goes for Opera.

So not only are Web browsers themselves perfect examples of UIs that target the lowest common denominator, but the environment that they present is that very same philosophy taken to the extreme. The only consistency is that there's inconsistency. The built-in UI controls or widgets are extremely limited. And everyone who has even had to do a minor amount of Web development knows that the environment differs significantly between the different browsers.

When it comes to Web applications, it's not that they "won't seem out of place on any operating system." Rather, it's that they won't fit in with any existing desktop environment. In a sense, Web applications are typically so horrible in that respect that most users can't even recognize how bad these applications are. For some odd reason, users tend to use confusing, inconsistent or poor-quality Web applications far longer than they would the desktop equivalents.

The situation likely won't improve. Like Marcus mentions in a footnote in his article, RIAs only make the bad situation even worse. They allow for further inconsistency in an already inconsistent environment. And the other "innovations" we are seeing are just poor attempts to bring existing desktop application concepts within the browser. The canvas element and O3D are good examples of this. They both pale in comparison even to the cross-platform, desktop-equivalent abstractions of libraries and APIs like wxWidgets, GTK+ and OpenGL.

In fact, the Web development community has even gone so far as to try and abstract away the variety of programming languages we have available with a typical desktop environment. We are stuck using JavaScript, which ends up bringing us the worst of all worlds, like the Web application development environment itself.

All in all, those of us who have developed real desktop applications for years and year end up being quite disappointed with what Web development offers, or more correctly, all that it doesn't offer. We've thrown out everything we learned during the 1970s, 1980s and 1990s, only to replace it with half-baked, browser-based "alternatives" over the past decade. We've essentially taken a huge step backwards, amplifying the very problems that Marcus spoke out against during the first part of his article.

Permalink: http://pinderkent.phumblog.com/post/2009/04/web_applications_are_a_poor_approach_for_developing_highquality_crossplatform_applications
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives