Pinderkent

Pain and glory from the trenches of the IT world.

Parrot just can't compete with LLVM, the JVM, and the .NET CLR.

Posted on Sunday, May 23, 2010 at 11:51 AM.

I read an article today, written by Andrew Whitworth, that discusses Parrot and its fitness as a target platform. His article, along with other recent developments, may very well answer the question I asked nearly three years ago, Will Parrot ever truly deliver? Unfortunately, the answer appears to be a resounding No.

For those who might not be aware, Parrot is, according to their web site, a "virtual machine designed to efficiently compile and execute bytecode for dynamic languages." Although it has been in development for about a decade, there has been comparatively little to show for all of the effort that has gone into it. Sure, there have been frequent releases, but in the end we still don't have a platform that garners much attention, and we still don't see anyone really putting forth a lot of effort to target it.

Andrew's article helps highlight why both language implementors and users may be hesitant to spend time targeting Parrot. Towards the end of his article, he covers parts of the system that he thinks will be seeing major changes within the next few months. Throughout these seven points, we see some very unsettling things. The very first point, for instance, mentions that, "GC is a very internal thing, when it works properly, you don't even need to know it exists." Now, garbage collection isn't a trivial task, but it has been very well studied and implemented many times over in real-world systems. Although we can't expect any such system to be perfect, it is unsettling when we read statements like "when it works properly" regarding a ten-year-old virtual machine platform. There just shouldn't be so much doubt about such a fundamental part of a virtual machine.

The second point is no better. Just-in-time compilation, like garbage collection, is another one of those cornerstones of a VM that we should expect to be mature and robust after 10 years. It's very worrisome to read that Parrot is lacking so badly in this area, even after two major releases.

The third point is perhaps the worse of all. In it, he states, "We don't really have a good, working, reliable threads implementation now and HLLs are generally not using them." It's currently 2010, and the situation today is that almost all new desktop PCs, and even many notebooks and netbooks, have a CPU with at least two cores. Most server-grade computer systems offer several times that, with multiple CPUs, with multiple cores per CPU, and even multiple threads of simultaneous execution per core. Efficiently using these systems to their full potential currently means writing multithreaded software. Like garbage collection and just-in-time compilation, threading has been well-studied, implemented repeatedly, and is one of the major pieces of any virtual machine platform. There's just no excuse for Parrot not to have better multithreading support.

The fifth point is pretty serious, as well. It discusses packfiles, which are the files that contain Parrot bytecode, debug data, and so forth. This is one more essential part of any VM implementation that should be very mature after a decade's worth of development. It's disappointing to hear that there are still portability issues with these files after so many years.

After reading about those rather serious deficiencies, I have a hard time understanding how Andrew can suggest that, "In summary, Parrot is a good, stable platform for HLL developers to use." From what I can see, Parrot is a platform that has had a lot of time and opportunity to make something of itself, but due to various problems, from internal developer strife, to a bad reputation, to a lack of serious users, it just hasn't matured.

Since I wrote my other article about Parrot almost three years ago, we've seen major developments out of the other major VM providers. We're seeing the Java platform get better support for dynamic languages in the upcoming JDK 7 release. We've also seen Microsoft's Dynamic Language Runtime become available for their .NET platform, allowing for mature and usable language implementations like IronRuby and IronPython to be developed.

Perhaps the biggest threat of all to Parrot is LLVM. LLVM has become widely accepted by industry, and even significant open source projects like FreeBSD are integrating and supporting it. In addition to having excellent support for C, C++ and Objective-C, we're even seeing it used as the back-end for dynamic programming language implementations. Rubinius and MacRuby are two examples of Ruby implementations that support LLVM. Then there are Python implementations like Unladen Swallow and PyPy.

I just don't think that Parrot can compete with these other platforms. Parrot has spun its wheels for far too long, and just isn't as mature as the JVM, the .NET CLR, or LLVM have become. Aside from casual or hobby development, I don't see why anyone would develop a software system specifically targeting Parrot. Its future seems extremely bleak at this point.

Permalink: http://pinderkent.phumblog.com/post/2010/05/parrot_just_cant_compete_with_llvm_the_jvm_and_the_net_clr
Share:

Functional programming JavaScript is a dead-end exercise.

Posted on Saturday, October 31, 2009 at 4:30 PM.

Yesterday a colleague forwarded me the link to Underscore.js. It's a JavaScript library that provides some functions commonly offered by functional programming language implementations.

Now, I can understand completely why JavaScript programmers would desire to use such techniques. They bring some very clear and powerful benefits, including shortened development time, fewer lines of code, greater flexibility, and improved readability. However, I do hope that JavaScript programmers using such a library don't come to think that they're actually doing functional programming.

One of the most significant areas where JavaScript fails with respect to functional programming is immutability. Current implementations have spotty support for constants, leading to various workarounds. So while it's possible to manually avoid state changes as much as is possible, it becomes difficult to do when performing browser-based JavaScript development. And it's almost always better to have the language implementation strictly enforce const-ness, rather than trying to have developers communicate this intent through all-caps variable names, for instance.

Many functional languages also offer very rich pattern matching functionality, which we just don't get with JavaScript. While some people have tried to implement pattern matching in JavaScript, it is nowhere near as clean or natural as pattern matching in Haskell or pattern matching in Erlang.

Many, but not all, functional programming languages also offer strict, static typing. There has been much debate about the pros and cons of the various typing techniques employed by various languages. In the end, however, experience shows that static typing results in higher-quality software, and static typing saves developer time. Unfortunately, JavaScript as a language, as well as its current widely-used implementations, don't lend themselves to strong, static typing.

It also doesn't help that JavaScript came mainly from the imperative and prototype-based OO world, and is only now trying to adopt features and techniques from the functional paradigm. It's often much cleaner to start with a purely functional language, and then add useful imperative features like for loops and references, as was done with Objective Caml. So the language and standard libraries have a functional feel to them, and naturally encourage the use of functional techniques, yet still allow the use of imperative features where they may prove to be the best option.

In many respects, I see trying to do functional programming in JavaScript much like doing object-oriented programming in C. While it can be done, to some extent, it never feels very natural because it lacks support that should be provided at the language level and by the implementations of said language. GObject is one of the most widely used C object systems, and as anyone who has written even a moderately sized GTK+-based system in C knows, it's not a very pleasant ordeal. Languages like Objective-C and C++ offer a much more developer-friendly experience.

So I see trying to add features and techniques from functional programming to JavaScript as generally being a pointless exercise. It may help in some cases, but ends up just masking the symptoms of a greater problem, namely that we want (and maybe even need) a true functional programming language available in all of the popular Web browsers. I've suggested Haskell in the past, but just about any functional language would be better than JavaScript.

Permalink: http://pinderkent.phumblog.com/post/2009/10/functional_programming_javascript_is_a_deadend_exercise
Share:

"Utility" or "helper" classes are a sign of a language defect.

Posted on Tuesday, October 06, 2009 at 2:03 AM.

Chris Eargle recently wrote about so-called "utility" or "helper" classes. Within his article, he states that "There should never be a Utility class which is used as a general bucket. Every method in your system means something, it belongs somewhere." I can agree with this sentiment, nor can I necessarily argue in favor of using such classes. However, I do think that a tendency for developers to create such classes indicates that there is likely an inherent flaw with the programming language that they're using.

We most often see "utility" or "helper" classes arise when using languages like Java and C#. When first developed, these languages took an OO-or-nothing approach. This isn't surprising, especially in the case of Java. When it arose during the 1990s, the software development community as a whole was generally quite enthusiastic about object-oriented programming. So one notable feature it is missing is the traditional function.

Many OO purists will decry the functions and procedures that are native to many imperative languages. They claim there is no place for standalone functions within object-oriented languages and well-designed software. But it really just comes down to a typical clash of theory versus pragmatism. When developing real-world software, sometimes a plain old function is exactly the tool that we need.

So while languages like C++, Python and even OCaml allow for both functions and objects to be used, Java and C# unfortunately do not. Developers using languages like Java and C# have to resort to abstract classes with static methods, or similar workarounds. As Chris notes in his article, this isn't an ideal situation by any means.

Given that we, as a community, now have many more years of developing software using object-oriented languages and techniques, I think it's safe to say that our tools may need some minor modifications. Languages like Java and C# are missing an essential construct, and that construct is the function. Like any tool, functions can be misused. But as we've seen, their absence can result in other hackish designs that pose several problems of their own. So perhaps we will eventually see this deficiency addressed by adding functions to such languages, so we can use them when they do prove to be the best tool for getting the job done.

Permalink: http://pinderkent.phumblog.com/post/2009/10/utility_or_helper_classes_are_a_sign_of_a_language_defect
Share:

An orgy of misinformation regarding programming language performance.

Posted on Sunday, May 17, 2009 at 7:42 PM.

Last week, a colleague forwarded me the link to one of the most blatantly incorrect computing articles he had ever seen. It's entitled C# vs C/C++ Performance, and after reading it, I must agree with my colleague. It isn't often that we get to see an article so full of misinformation.

We see the first major mistakes within the "Point 1" discussion. The author clearly has some serious misunderstandings about hyper-threading and instruction sets. One of the gems we see is: ... a C++ program will not be able to take the advantages of the "Hyper Threading" instruction set of the Pentium 4 HT processor.

Of course, no such instruction set exists. Hyper-threading, as implemented by Intel up to this point, is transparent to userland applications.

Immediately after that, we read: Of course HT is outdated now....

That is, of course, absolutely false. Intel's recent Core i7 processor makes use of hyper-threading, with each of its four cores supporting two simultaneous threads. It's not an "outdated" technique.

It gets better as we read on: It will also not be able to take advantages of the Core 2 duo or Core 2 Quad's "true multi-threaded" instruction set as the compiler generated native code does not even know about these instruction sets.

Again, we see more ignorance regarding instruction sets and hyper-threading. While newer CPUs do often include new instructions, the supposed "true multi-threaded" instruction set that the author of that article writes about is bunk.

The next misleading claims we see are as follows: In the earlier days, not much changes were introduced to the instruction set with every processor release. The advancement in the processor was only in the speed and very few additional instruction sets with every release. Intel or AMD normally expects game developers to use these additional instruction sets.

We can see how wrong this is by looking at Wikipedia's x86 Instruction Listings page. It shows the original 8086/8088 instruction set, along with the instructions added with each processor generation. Based on that information, we can see that older processors such as the 80386 and the Pentium Pro each added quite a few new instructions. And the claim that new instructions are typically added for "game developers" is laughable. Games are just a small subset of the multimedia applications which benefit the most from the newer instruction sets. And that's not to mention the scientific and engineering applications which benefit significantly, as well.

Next we move on to the "Point 2" discussion. It almost immediately starts off with a five-line snippet of completely unrealistic code. It's not even a remotely valid microbenchmark, which themselves are often bad enough when they actually can compile and actually can be executed. Yet from this code, which consists of an undefined function that performs a "really time consuming operation" being called 100,000,000 times, the author of that article comes to the conclusion that "C++ is faster by a order of magnitude." Huh?

The next sentence reads: Nearly all the threads I've seen that claims C++ is faster writes a small application like this a prove that C++ is atleast n times faster than an equivalent c++ program and yes it's true.

So we find out that the claims of the author's article aren't even based on personal experience, let alone more rigorous approaches. They're based on what was read in some message board or mailing list. And beyond that, we can see statements that don't make even the slightest bit of sense as written, such as "C++ is atleast n times faster than an equivalent c++ program." The second "c++" should apparently read "C#".

The "Point 3" discussion focuses on memory management. While the author is somewhat correct in pointing out that memory management is more involved when using C++, there is no mention of Boost's smart pointers, the Boehm-Demers-Weiser garbage collector, Valgrind and the various other technologies that greatly help to prevent or track down memory leaks in C and C++ applications. I've seen first-hand how these technologies can be used to develop long-running systems in C++ that contain millions of lines of source code, yet don't suffer from obvious or significant memory leaks.

Further along, we get to read: Everyone knows that page fault is one of the most time-consuming operation as it requires a hard disk access. One page fault and you are dead.

While excessive page faults are typically bad for performance, they're not the evil that the author of that article portrays them to be. A single page fault won't typically harm performance as badly as the author suggests. And with most modern operating systems, we often see demand paging used. In such a scenario, we don't load a page from disk until it's actually referenced, which can lead to improved application startup times and reduced memory usage. So some page faults are to be expected.

The next misunderstanding is: A lot of classical applications including Google Picasa suffers from memory management problems. After about two or three days, you can notice that these applications become slower necessitating a Windows Restart. This problem is completely alleviated in C#. the Framework comes with a broom behind you and sweeps your drop during the course of the execution and as a result your working set never grows (unless you really use it) which means lesser page faults.

While many desktop applications today do leak memory, it doesn't make sense for the author of that article to suggest that we need to perform a full reboot of the operating system. Killing the application process should, under most modern operating systems (including most versions of Windows still in use), be sufficient to free whatever memory it may have been using. Furthermore, it's incorrect to suggest that such problems will be "completely alleviated" while using a language like C#. It's quite possible for an application to have code that maintains references to objects that are no longer needed, thus preventing them from being garbage collected. Carelessness can cause problems regardless of which programming language is being used.

The author of that article comes to the conclusion that the best thing to do is take a hybrid approach; write most of the application in C#, and have it call out to performance-critical code written in C++. While this is an option, a better approach is to first profile your code to see where and why it is actually slow. Don't just assume it's the language. Often times, we see poor algorithms being used, or unnecessary computation being performed. Based on my years of experience, I'd say that fixing such issues will typically give a much greater performance boost than changing programming languages.

Every author, myself included, will no doubt make minor mistakes here and there while writing. They're expected, and forgiven. However, the article that my colleague linked me to was incorrect from top to bottom. What is perhaps the most disturbing thing about that article is that some people may very well believe what it is saying to be true. Perhaps articles like that one are why so much software is so poorly written. To those who don't know better, such articles sound legitimate and sensible. But after even the slightest bit of analysis, we see such articles fall apart almost completely. Unfortunately, there are a lot of people out there who can't or won't perform such analysis.

Permalink: http://pinderkent.phumblog.com/post/2009/05/an_orgy_of_misinformation_regarding_programming_language_performance
Share:

C and C++ play a very crucial role in most Web application systems.

Posted on Friday, May 15, 2009 at 2:21 AM.

Today, over at Hacker News, I saw a topic asking why C++ isn't commonly used for Web applications. The question itself is quite valid; we typically don't see Web applications themselves developed in C++. But that doesn't mean that C and C++ don't have an integral role within a Web-based system. Their use isn't as visible as that of Ruby, PHP, Python or Perl, but it's important nevertheless.

Admittedly, the back-end of many Web applications really isn't all that complex. In many cases, it's basically just a friendlier interface to a datastore of some sort, maybe offering some caching, and usually some basic data manipulation. And although C++ libraries like the STL and Boost allow for such tasks to be performed with relative ease, there's essentially little benefit in using C++. Scripting languages are often sufficient.

That said, C and C++ still do have a huge role in most Web application stacks today. We shouldn't forget that most of the popular server operating systems, Web servers and database systems today, as well as the most widely used implementations of most scripting languages, are typically written in C or C++. This is quite apparent within the popular open source Web stacks.

At the very core, we have C playing an integral role in virtually all of the popular server operating systems today, especially UNIX-like systems like Linux, FreeBSD, and Solaris. On top of that, we have popular Web servers like Apache, nginx, and lighttpd that are all written in C. And for database systems, PostgreSQL and SQLite are written in C, while MySQL uses both C and C++.

C and C++ are also critical to the programming languages used to implement many Web applications. The most widely used implementations of Python, Ruby, Perl and PHP all use C. Even Sun's HotSpot Java virtual machine makes very extensive use of C and C++.

So when we take a more holistic view of Web applications, we see that C and C++ prove to be very widely used. They're used for some of the most critical aspects of Web-based systems, where performance and reliability truly matter. Even if they get more of the attention, languages like PHP, Python, Ruby, Java and Perl end up being little more than glue languages, tying together the software implemented in C or C++. It becomes easy to forget their importance, but this may just be because the software developed using them has matured to the point where they provides such stable interfaces that we can totally ignore their implementation language. Nevertheless, C and C++ are very critical to the vast, vast majority of Web applications that exist today.

Permalink: http://pinderkent.phumblog.com/post/2009/05/c_and_c_play_a_very_crucial_role_in_most_web_application_systems
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives