Pinderkent

Pain and glory from the trenches of the IT world.

Losing developer time to performance problems hidden by high-level languages.

Posted on Saturday, May 23, 2009 at 11:48 PM.

One of the main purposes of high-level programming languages is to save developer time by abstracting away the onerous and tedious aspects of the underlying hardware. In general, most high-level languages tend to do a good job at this. Unfortunately, we see these same high-level languages wasting significant amounts of developer time. Many times, this is due to performance problems. What becomes problematic, however, is that in order to properly diagnose and fix many of these performance problems, the developers involved need to obtain a high degree of understanding about the implementation of the high-level language that's involved.

A good example of this is a performance issue described recently with IronPython, an implementation of Python for Microsoft's .NET platform. In short, a very innocuous line of code was apparently responsible for the poor performance.

This incident highlights several main problems. The first is that high-level code can lead to some very unexpected interactions within the high-level language's implementation. This can obviously cause problems by misleading the developer or developers dealing with the performance problems. What appears on the surface to be a simple and likely very fast operation ends up being the culprit. A lot of developer time can be spent looking in the wrong places.

The second concern is that tracking down the problem requires in-depth knowledge about the high-level language's implementation. To some extent, we use such high-level languages in the first place to avoid needing to acquire such lower-level knowledge. We want to focus on the application we're writing, not on dealing with issues pertaining to the platform we're building upon. Time spent learning about the high-level language's implementation is time not spent on developing the application at hand.

This particular situation seems to have had a "happy" ending. The victim of the poor performance got a rapid response from somebody who did have inside knowledge about IronPython's implementation. Unfortunately, this isn't always the case. I've seen far too many times when developers have spun their wheels trying to track down obscure performance problems of that type. And it isn't a problem associated just with programming languages like Python, Ruby, or Perl, either. We often see it happen with SQL. A minor change to a query can result in a huge performance gain or loss.

As we start using high-level programming language implementations like IronPython, Scala, Clojure and JRuby, which are themselves often implemented in high-level programming languages like Java or C#, which in turn run on some sort of a virtual machine, we'll run into these sorts of problems more and more frequently. Each additional layer of software abstraction that we add in makes the situation more and more difficult. Soon we may need to look in two or three very different layers of software, assuming we even have source access, to track down performance issues. This could very well lead to a serious waste of developer time and effort.

Permalink: http://pinderkent.phumblog.com/post/2009/05/losing_developer_time_to_performance_problems_hidden_by_highlevel_languages
Share:

It is pointless to handicap Erlang just to reuse existing Java classes.

Posted on Sunday, April 05, 2009 at 7:50 PM.

Today I read Kevin Smith's recent article entitled Erlang Doesn't Fit The JVM. As the title suggests, he discusses why efforts to port Erlang to the Java Virtual Machine really wouldn't be of much value. In general, I tend to agree with his points. The JVM does indeed lack the necessary functionality that makes Erlang particularly valuable.

Unfortunately, I think he missed discussing one of the major arguments often used in support of having non-Java programming languages target the JVM, namely that of reusing existing Java code. The Scala homepage, for instance, currently mentions in its introduction to Scala that "Scala has the same compilation model (separate compilation, dynamic class loading) like Java and allows access to thousands of existing high-quality libraries." Likewise, the Clojure Rationale page lists the reuse of existing Java code as a necessity: "Ability to call/consume Java is critical." Thus we can see that code reuse is one of the driving factors behind these new languages targeting the JVM.

Code reuse is one of those things that sounds great, especially to managers. But in reality, however, it's often of somewhat limited value. That's not to suggest that all code reuse is bad; clearly it isn't. There is little point in having developers today write basic string handling code, or common math functions, and so forth. But once we start getting into application- or domain-specific code, reuse starts to become less useful, and can often become quite a problem.

I think this becomes even more of an issue when trying to integrate code written in one language with that in another. This is especially true when one language (such as Scala) offers higher-level functionality and concepts much in excess of the other language (Java, for instance). Libraries and frameworks written with Java in mind very likely won't be designed in such a way as to make the best use of the language features and approaches that Scala, for example, makes possible.

The same goes for Erlang. One of the main reasons why we may choose to use Erlang over Java is that Erlang brings a set of concepts and solutions that, at the language level, just aren't offered by Java. In many ways, the general principles of Erlang-based development don't mesh well with the Java philosophy. This is evident at the very basic of levels, where Erlang embraces immutability, while mutability is widely used and encouraged within the Java development community. Likewise, lists and tuples are first class language constructs in Erlang. But when using Java, one has to resort to arrays or classes wrapping arrays to reproduce the functionality of lists. The Java equivalent of a tuple is usually just a class with several member variables. And while an Erlang developer will typically use recursion to iterate over a list of values, a Java programmer would likely use a loop. Things become even more complicated and mismatched when we consider how Erlang's approach to concurrency and distributed computing can be somewhat emulated in Java, but it isn't as natural as it is in Erlang.

When trying to combine the Java and Erlang philosophies, or more generally when trying to combine traditional imperative and object-oriented techniques with those of functional programming, we often run into the so-called "impedance mismatch" so often seen between object-oriented class hierarchies and database-based relational models. So the concepts of Java and Erlang can be mixed, but they often come together poorly, assuming the outcome is even usable.

While languages like Scala and Clojure, as well as an implementation of Erlang for the JVM, might allow for interoperability with existing Java code, the practical benefits likely don't exist. On one hand, we can choose to reuse existing Java code heavily, but at a cost of not making use of the new and often productive features of the newer, non-Java language. At that point, we might as well just go back to using Java. On the other hand, we can heavily use the language features of languages like Clojure, Scala and Erlang, and make a more minimalistic use of existing Java classes. Unfortunately, this will likely just lead to interoperability hacks, which can become maintenance headaches down the road. Trying to find a middle ground will just result in lost functionality, or an increasing number of hacks.

When moving to a new paradigm of programming, we sometimes just need to admit and accept that we shouldn't try to reuse our previous code directly. We're likely moving to the new paradigm because we need to increase our productivity in order to remain competitive and viable. And while reusing our existing code may initially sound like a good idea, it will only serve to become an anchor. A better idea is to keep in mind the knowledge and experience we obtained while developing the older code, but to put the focus on using the features of the new language or languages to our best advantage. This likely will mean reimplementing existing functionality, but doing so with a new philosophy and mindset. And so we'll make true progress by applying new ideas to existing problems, rather than by trying to convolute our existing solutions and ideas into the new way of thinking. It just doesn't make sense to cripple a powerful and advanced language like Erlang just for the sake of reusing some existing code.

Permalink: http://pinderkent.phumblog.com/post/2009/04/it_is_pointless_to_handicap_erlang_just_to_reuse_existing_java_classes
Share:

Getting to know today's practical GUI toolkits.

Posted on Tuesday, March 24, 2009 at 1:23 AM.

For years now, Andy Tai has done a great job of maintaining the The GUI Toolkit, Framework Page. It's a very extensive list of GUI toolkits and frameworks, both open source and commercial, for a wide variety of languages and platforms. In fact, it's almost too complete. Many of the toolkits listed are no longer developed or became obsolete years ago. So my aim here is to narrow down his huge list to the toolkits and frameworks that are practical, from a developer's perspective, and worth using today.

First and foremost, we have Qt. Many consider it to be the premiere C++ GUI toolkit. That's not surprising, of course. It has a long history of being developed as a commercial product, first by Trolltech and then by Nokia after they acquired Trolltech. Nevertheless, it has also been released under a variety of open source licenses throughout its lifetime. Given its maturity and use in a wide variety of software systems, including KDE, Opera and Google Earth, it has become known as a very reliable, portable, high-performance and high-quality toolkit.

Although Qt is written in C++, a variety of bindings have been developed for other languages. Some of the most notable include QtAda for Ada 2005, PyQt for Python, PHP-Qt for PHP, QtRuby for Ruby and qtHaskell for Haskell. In short, it proves to be a great toolkit, almost regardless of what language or platform you're using.

After Qt, we can consider wxWidgets to be the next most practical GUI toolkit. Like Qt, it is written in C++, is available under an open source license, is extremely portable, and can allow for the development of a professional-grade UI. Also like Qt, there are bindings for a number of popular languages, including wxPython for Python, wxRuby for Ruby, wxHaskell for Haskell, wxPerl for Perl, and even wxErlang for Erlang and wxLua for Lua.

One of the main benefits of wxWidgets over other GUI toolkits is its use of native controls. This allows applications developed using it to integrate nearly seamlessly with the host operating system, even while remaining somewhat portable. So while other toolkits offer themes that try as best as possible to emulate the behavior and appearance of the native platform's UI toolkit, this is often done imperfectly. A perceptive user will know they're not using a native application. But with wxWidgets, we typically don't find this happening.

After wxWidgets, GTK+ can be considered the next most usable toolkit. One difference that it has from Qt and wxWidgets is that it is written in pure C, rather than C++. While this makes language bindings easier to develop, it does have some drawbacks. One such drawback is that it makes extensive use of the GObject object system to provide object-oriented-like functionality for C. This typically feels inferior to using an actual OO language.

While it is portable to other platforms, its origin as an X Window System toolkit can still be felt. Applications using the Windows port of GTK+, for instance, typically don't truly feel like a native Windows app. Nevertheless, the Windows ports of applications like GIMP and Inkscape are usable and reliable. For the best experience, however, it's usually recommended to use GTK+ applications within an environment such as GNOME or Xfce, both of which are built upon it.

Like wxWidgets and Qt, GTK+ also has a wide variety of language bindings. Some of the most widely used include gtkmm for C++, PyGTK for Python, Gtk2-Perl for Perl, Ruby-GNOME2, PHP-GTK for PHP, Gtk2Hs for Haskell, Gtk# for the languages supported by Mono and .NET, and LablGTK for OCaml. Unlike Qt and wxWidgets, which are quite usable for large applications written in their native language (C++), it's probably best to use GTK+ from one of its more mature bindings, such as gtkmm, PyGTK or Gtk#, rather than from straight C.

Swing is relatively old and well-known. Originally Java-centric, it is becoming a more viable option as languages like Scala and Clojure, which target the JVM, become more prevalent. Other language implementations targeting the JVM, like Jython for Python and JRuby for Ruby, make it even more usable. Unfortunately, it does have a number of problems. It has never been known for offering high performance, and is somewhat memory-intensive. Applications written using Swing never truly feel like native applications, even when using a platform-specific theme. The API itself is also quite messy, having accumulated much cruft after a decade. And many developers don't want the extra baggage associated with the Java runtime, which makes it less appealing for those not writing an application specifically for the Java platform.

The FOX Toolkit is likely the most practical toolkit after Qt, wxWidgets, GTK+ and Swing. Like Qt and wxWidgets, it's written in C++. But unlike them, it doesn't offer as much of an accompanying framework. So in many respects it's a much more lightweight toolkit. And unlike some other toolkits, it doesn't (yet) include support for themes, so it has its own unique look and feel that is reminiscent of the traditional Windows look and feel. Nevertheless, it is portable and released under an open source license.

Unlike the aforementioned toolkits, the FOX Toolkit doesn't have as wide of a variety of language bindings. FXRuby for Ruby is a mature and usable binding, but others, like the FXPy binding for Python and the EiffelFox binding for Eiffel, have stagnated. So the FOX Toolkit is probably best used from its native C++ or from Ruby.

FLTK is similar to the FOX Toolkit in many ways. It's also written in C++, is portable, and is more lightweight than toolkits like Qt and wxWidgets. Unfortunately, it doesn't have as many bindings for other languages, and the ones that do exist (like the Ruby wrapper and pyFLTK for Python) aren't updated very frequently. So while FLTK is usable, it probably isn't the best option for more complex applications that are expected to have a long lifespan.

There are many other GUI toolkits out there, both commercial and open source. In terms of cost, portability, usability, user-experience and programming language interoperability, the toolkits mentioned above are typically the best options. Qt is perhaps the most flexible option for writing high-quality, portable GUI applications, followed closely by wxWidgets. GTK+ is good for software running on UNIX-like systems, but doesn't offer as seamless as an experience on other platforms. Swing is typically the choice for those targeting the JVM. And toolkits like FOX Toolkit and FLTK provide alternatives for those who don't want the baggage of the larger and more complete frameworks. While no toolkit is perfect for every piece of software, picking Qt, wxWidgets, GTK+, Swing, FOX Toolkit or FLTK should prove to be a safe, viable, practical and capable choice.

Permalink: http://pinderkent.phumblog.com/post/2009/03/getting_to_know_todays_practical_gui_toolkits
Share:

The Java platform needs an overhaul, not just the Java language.

Posted on Sunday, March 22, 2009 at 4:28 PM.

There has been some discussion lately on several blogs about Java and its feasibility as a programming language for practical development these days. The articles I'm talking about are Bruce Eckel's "The Positive Legacy of C++ and Java", "Java as Legacy Language" by Kas Thomas, and "Is it time to retire Java?" by David Arno.

The general theme of all three articles is that the Java language has not kept up with the times. That's not to suggest that it's useless or will disappear any time soon, however. Anyone with experience in industry knows that there is an absolutely huge amount of Java code out there, critical to many business operations. But it is true that Java does not offer the productivity it once did. Relative to languages like C and C++, it did allow for more complex systems to be developed with less effort, and in a shorter period of time. But languages like Python and Ruby have been attacking it from beneath, and now prove to be a better option for many software development projects.

The general feeling after reading those articles is that although the Java programming language isn't as beneficial as it once was, it did provide us with the JVM, and this in turn proves to be a useful platform for languages and implementations like Scala, Clojure, JRuby and Jython.

I'm not so sure that this is the case. The JVM has never really been a spectacular platform. Performance problems have plagued it from the very beginning. Some of the problems are inherent to the Java language itself, namely in limiting what optimizations the compiler, runtime and programmers can perform. Although there have been improvements relating to JIT compilation, for instance, I think a lot of the performance problems of Java have only been mitigated somewhat by hardware consistently getting much faster during the late 1990s and most of this decade.

When the topic of Java performance comes up, some people like to toss around various microbenchmarks showing that some obscure task can be done relatively quickly using code running on a Java virtual machine. Unfortunately, these results don't translate well to the real world, where applications tend to be far more complex. Part of the reason why desktop Java apps never really took off, for instance, was because most of them felt sluggish relative to other applications written in languages like C and C++. Application startup time has also always been a problem with applications running on the JVM. Were it not for server-side applications typically being long-running and typically running on more powerful hardware, it's doubtful that Java would have made the inroads it managed to make there.

Bloat is another issue plaguing the Java platform. The earliest versions of the runtime had installers that were 2 to 5 MB in size. While that seems like almost nothing today, in the mid-1990s that was a hassle to download over a dail-up connection, especially if one was paying by the hour. Although not as much of a problem now due to the prevalence of broadband Internet connections, the Java 1.6.0_12 runtime installer for x64 Linux is still relatively large at over 18 MB.

Even today, with many computers having two or more gigabytes of RAM, we see Java applications using a disproportionate amount of memory. On Linux using Sun's 64-bit 1.6.0_12 runtime, we see the Scala 2.7.3.final REPL using 593 MB of virtual memory, with 153 MB resident, as reported by top. And this is just after starting up the REPL while it's still at its first prompt, without having entered any expressions that may have increased the memory usage. Some applications are even worse. Just after startup, before loading any projects, NetBeans 6.5.1 is already using 1029 MB of virtual memory, with 249 MB resident. We can't blame just the JVM for such problems, but we also can't overlook its involvement.

The Java class library was once one of the key selling points of Java and the Java platform. It provided a lot of common functionality in a manner that could be easily reused by developers, thus increasing their productivity dramatically. But over the past decade, we've seen a variety of libraries for other languages arise that provide the same functionality, but often with much nicer and effective abstractions or APIs. Microsoft's .NET class library is one of the most widely used. Faced with such competition, the Java platform just doesn't look as attractive as it once did.

The Java class library is also starting to really show its age. While many classes and methods have been officially deprecated, they do remain around cluttering up the library. Others, like the AWT, are impractical for use today, but still must remain around due to later technologies, like Swing in the case of AWT, being built directly upon them.

Furthermore, a great deal of the classes don't make good, if any, use of generics, enumerations and the various other language features introduced since Java 5. The benefits of having such language features in Java are negated when many of the core library classes we use don't support them. We're often stuck still using integer constants where enumerations would be much more appropriate, for instance. Unfortunately, there is great reluctance to change the library in ways that would impact compatibility with older Java applications. So it's unlikely we'll ever see these problems fixed up appropriately.

When it comes to running non-Java languages on the JVM, we end up with even more problems. A language like Scala offers features, functionality and concepts that we just don't find within the scope of the Java language. So although it can be used, actually using the Java standard class library from a language like Scala negates many of the benefits of using the newer language in the first place. While such interoperability and reuse is often touted as a benefit of using advanced languages that target the JVM, I fear it's actually a significant downside, as it doesn't promote the effective use of the new language features that really do boost programmer productivity or application quality. They're crippling themselves by trying to reuse existing code that doesn't fit their philosophy.

So while languages and implementations like Scala, Clojure, Jython and JRuby are bringing more impressive languages to the JVM, they can't really do anything about the poor performance or the excessive resource usage of the JVM, nor can they do much about the poor state of the Java class library. But I'm also not suggesting that we have a viable alternative. Even though they recently released Parrot 1.0.0, it's still not a platform suitable for widespread production deployments. The Mono project is making some good progress, although they're typically playing catch-up to Microsoft's implementation, and don't have complete freedom to dictate how their runtime should behave. LLVM is perhaps the most promising alternative, but as the name implies, it is very low-level, not offering functionality we've come to expect from virtual machines (like garbage collection).

If the Java platform is to continue to serve us into the future, I suspect a significant overhaul is necessary. Performance and memory usage improvements to the JVM would be of a huge benefit to many. Better support for acting as the target of non-Java languages would likely be beneficial, as well. And a complete cleanup and reworking of the Java class library is a must. Unfortunately, this would not be a small undertaking, and would require huge amounts of time and effort. So it seems unlikely at this time that we'll see such a rework occur. This is somewhat unfortunate, as now is probably the best time to do it, while languages like Scala and Clojure have really started to mature, but just before they become so widely used that momentum against change develops.

Permalink: http://pinderkent.phumblog.com/post/2009/03/the_java_platform_needs_an_overhaul_not_just_the_java_language
Share:

The software impact were IBM to acquire Sun Microsystems.

Posted on Wednesday, March 18, 2009 at 11:36 AM.

There have been reports recently suggesting that IBM may soon acquire Sun Microsystems. For those of us in IT, acquisitions of these types are always a big deal. While Sun's server hardware is widely deployed and of great importance to many, my interest is mainly in software, so that's the area I'll focus on in this article.

In the software world, Sun is most well known for Java and Solaris. I suspect that we'll see two very different outcomes for these technologies if an acquisition does occur. It's difficult to say what will happen for some of the smaller or more obscure software products and projects that Sun was involved with, such as OpenOffice.org and NetBeans.

In terms of Java, I don't think we'd see much, if any, negative change. IBM has embraced Java, and understands its use, at least for business users. For some time now they've provided their own JDKs for a number of platforms. Java also plays a significant role in their prominent WebSphere Application Server. They're actively involved with developing new Java-based technologies. And their involvement and contributions to Eclipse have had a huge impact within the development community.

In some respects, such an acquisition may be quite good for Java. Although known for generally being more conservative, IBM does have the resources needed to make the changes to the Java language and platform that are necessary for it to compete better with Microsoft's .NET platform. We need to see half-baked technologies like JavaFX and JavaFX Script discarded. But we also need to see functional or object-functional languages like Clojure and Scala perhaps brought in as core components of the platform.

If IBM put their weight behind such technologies, we very well may see more developers be willing to adopt them. As we move into a world with massively multi-core CPUs, we will need to make use of functional programming techniques to write scalable software that effectively uses such hardware. Given IBM's (and Sun's) emphasis on high-end server hardware, a willingness to adopt and support Scala and Clojure could really put them in the lead in this area.

Things don't look as good in terms of operating systems. As basically everyone in the industry knows, Sun has offered Solaris for years, while IBM has offered AIX. While both high-end UNIX-based operating systems, I'm not certain if they could be successfully integrated. At a technical level, I suspect they are just too different.

If an acquisition were to take place, I imagine that Solaris would be supported for some time, but eventually deprecated in favor of AIX. This would be similar to what happened to Tru64 UNIX when HP and Compaq merged. If I recall correctly, HP originally planned to transition Tru64's more advanced features to HP-UX, but this didn't end up happening, for the most part. Now Tru64 is essentially on its last legs.

Given that Sun has released much of the Solaris source code over the past few years in the form of the OpenSolaris project, it seems likely that it will live on in at least some form. A truly self-sustaining community, akin to the Ubuntu project for Linux never seemed to develop, however. So it's difficult to say how much, if any, innovation we'd see out of the OpenSolaris project were Sun no longer supporting it.

Another interesting area to consider is that of the MySQL-related technologies and business that fell into Sun's fold with their acquisition of MySQL AB at the beginning of 2008. Given IBM's pivotal role in the development of relational databases, and their offering of the very solid and professional DB2 family of database products, MySQL's future in such an organization seems quite limited. Even within the open source community, MySQL is generally considered an inferior RDBMS. I really can't see it having much of a future, especially with Marten Mickos and Monty Widenius out of the picture.

We'll just have to wait and see what comes out of these rumors. But in terms of the software world, such an acquisition could have some significant impacts. They'd mainly be felt by those working with and developing enterprise systems, but may still be noticed by others, including, for example, OpenOffice.org users. So I see such an acquisition as potentially being good for Java, but less so for the Solaris, and potentially disastrous for MySQL.

Permalink: http://pinderkent.phumblog.com/post/2009/03/the_software_impact_were_ibm_to_acquire_sun_microsystems
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives