Pinderkent

Pain and glory from the trenches of the IT world.

Modular rewrites of existing codebases are often recipes for disaster.

Posted on Monday, March 30, 2009 at 11:55 AM.

Today I read an article that discusses the risks of rewriting poor codebases rather than trying to refactor them. It also gives some advice regarding how to go about performing a rewrite of a software system, at one point suggesting that:

If we do take the path of rebuilding, what are some things we can do in order to help ease the pain. First bit of advice is DO NOT LEVEL the program. Try to break off parts are going to be rebuilt and do it one at a time. Isolating these components gives you the chance to utilize unit testing, which is one of the most important tools you have when refactoring.

I've seen this approach used with a variety of systems over the years, and I'm not convinced it's a good idea. Nor do I think that it can necessarily be considered a "rewrite" per se, but instead more an example of large-scale refactoring.

The main problem is that the portion of code to be rewritten (ie. the "module") tends to be small compared to the entire codebase. Given that the software's quality is currently poor, it's unlikely that the code is naturally very modular to begin with. So even if we isolate our rewrite to a limited portion of the code, it's likely that it'll still perform data access like the rest of the code, obtain configuration setting values like the rest of the code, and so forth.

So what we find is that we want to do things "better" in our rewritten code, but we still need to remain somewhat interoperable with the existing code, databases, RPC services, Web services, and so forth. So if we choose to use the existing components, we're often forced into following their "mindset" and way of behavior, which ends up limiting the improvements we can make with the rewrite. On the other hand, if we start from scratch, the amount of work will likely increase significantly. We might also end up with code duplication between the old code and the new code, which can just make the codebase as a whole much more difficult to work with. If several teams perform such partial rewrites, we can potentially end up with five or more ways of performing the same basic tasks.

If it's decided that a rewrite is to be performed, it's often better just to go ahead and do it in full, rather than trying to piecemeal it using the existing code. The piecemeal approach often handicaps the new code, which can lead to a more confusing codebase rife with duplication, contradiction, outdated approaches and interoperability hacks.

Permalink: http://pinderkent.phumblog.com/post/2009/03/modular_rewrites_of_existing_codebases_are_often_recipes_for_disaster
Share:

The urge to rewrite software is not necessarily driven by ego.

Posted on Sunday, March 29, 2009 at 6:13 PM.

As the title of Will McGugan's On Ego and Software Development article clearly states, it discusses the impact that developer ego has on software projects. One of the impacts he mentions is that of rewrites of inherited codebases. His suggestion is that the developers who inherit the code often think that the existing code is poorly written, and believe that they could do it far better. I'm not sure that this is typically the justification for a rewrite, especially in business settings.

Over the years I have been involved with many projects involving a transfer of a significant amount of code. As with much software, this code may not be poorly written, but typically isn't as clean as it could be. Depending on the setting, the code may have been written very quickly, it may be unnecessarily complex, it may have been added to haphazardly over the years, and it may have been developed by programmers who weren't overly talented. Even if there's no urge to rewrite the software initially, such ideas can arise once the new developers scope out how much work it will take to come up to speed with the existing code.

The amount of time and effort that it takes to become familiar with an existing codebase depends on a wide variety of factors. The languages, libraries, frameworks and other technologies used are likely the most significant factors, especially when considering the new developers' familiarity with them. There may be issues surrounding the deployment environment. The new developers may not be overly familiar with the business domain the software serves. The quality of the original code is a factor, as well. And those are just a few of the issues that need to be considered.

Sometimes the effort necessary to become adequately familiar with the existing code far exceeds the time it would take the developers to rewrite and test the code in question. This is where we often see rewrites taking place. It's not so much about the new developers caving to their egos and thinking they can do it better. It's actually more about the developers recognizing their own weakness in the situation, namely the scale of the difficulties they would face were they to work with the existing code rather than rewriting it anew. The decision to rewrite ends up being almost purely economically-driven, rather than ego-driven.

Permalink: http://pinderkent.phumblog.com/post/2009/03/the_urge_to_rewrite_software_is_not_necessarily_driven_by_ego
Share:

Higher-level languages aren't about making experts more productive. They're about allowing average programmers to do the otherwise impossible.

Posted on Saturday, March 28, 2009 at 11:41 PM.

I read an article today about whether higher-level programming languages like Python, Perl and C# are really that much more productive than a comparatively lower-level language like C. This is not a new line of discussion, by any means. But we're getting to the point where we've been using such higher-level languages for over a decade, and thus have had more of an opportunity to observe and analyze how successfully (or not) they've been used.

In my view, that article comes to the general conclusion that many of the popular claims regarding the benefits of high-level languages versus low-level languages don't hold true. It's suggested that languages like C do indeed have aspects that hamper developer productivity, but high-level languages bring their own, albeit different, set of problems. While a C developer may run into problems with pointers, a Perl programmer might lose a similar amount of time optimizing regular expressions.

I think the conclusions of that article are correct to some extent, but I also think that the greater picture may have been missed. The real impact of languages like Python, Ruby, Perl, JavaScript and PHP isn't that they allowed expert programmers to be marginally more productive. Their greatest "benefit" (or arguably their greatest disadvantage) is that they have allowed average and even poor programmers to accomplish things they couldn't have reasonably done in C.

PHP and JavaScript are good examples of this. As anyone who has used them knows, they are very unremarkable languages. Conceptually and syntactically, they're much like C in many ways, but without some of the aspects of C that average developers often find bothersome. They aim to eliminate manual memory management, for instance. They offer slightly nicer string handling. Their execution environments aren't as tied to the native hardware. But otherwise, the core PHP and JavaScript languages generally offer the same basic functionality that C offers.

By eliminating some of the more difficult aspects of C, even if they're not as flexible or as powerful in many ways, they've made programming accessible to people who otherwise would not have been able to handle it. I've had the misfortune of working with people like this. They can understand the concepts of variables, constants, loops, conditionals, functions and even the basics of OO to some extent. But they're totally unable to understand some of the basic, yet essential, concepts of C. Pointers, for some reason, is a common one. But luckily for such average developers, languages like PHP and JavaScript make their lives easier by getting rid of such constructs and functionality.

So we soon enough see these average developers using languages like PHP and JavaScript to develop applications. In many cases, there's little to nothing preventing the same application from having been developed using C, aside from the inability of the average developer or developers to use C. Anyone who has worked in the industry knows why businesses opt to go with such solutions. Sometimes it is cheaper and easier to hire several PHP and JavaScript developers, instead of just one or two expert C developers. Other times it's because inexperienced or unknowledgeable managers just don't know any better, or have bought into hype and marketing. Regardless, the outcome is typically a system that just barely works, assuming it's not outright broken. Whatever costs might have been saved initially end up becoming far more costly in the long run.

Had those JavaScript and PHP developers been forced to use C, it's likely that we wouldn't have seen any sort of a software system be produced at all. They would've still been struggling with significant memory leaks, segfaults, and sometimes even just getting their code to compile. So we can see the actual main benefit of such higher-level languages; they've reduced the complexity of an otherwise difficult skill down to something that is more palatable to non-experts.

Now, we need to ensure that we don't lump high-level languages like Haskell, Erlang, and Common LISP in with other high-level languages like PHP and JavaScript. They are clearly very different. Haskell, Erlang and Common LISP, for instance, use abstraction to empower the developer. They offer advanced features and techniques that expert developers can build upon to great benefit. This is very different from languages like PHP and JavaScript, which clearly took the C model of computing, and stripped out the parts that make C more awkward for the less-skilled programmers.

Even thought they are significantly higher-level languages than C, languages like Erlang, Haskell and Common LISP haven't become as popular because they still require a high level of knowledge and expertise to use for even the most basic of tasks. So they highlight the important difference between a language being "high-level" and a language being "accessible". Functional languages increase expert programmer productivity with more powerful abstractions; PHP and JavaScript increase average programmer productivity via simplification.

The whole debate with respect to whether high-level languages are better than low-level languages will likely rage for many more years. There are some tasks that just can't be done in languages like JavaScript and PHP, so we will surely see C remain around for a long time. But we likely will see languages like PHP and JavaScript remain around for a similar reason. Unfortunately, that reason won't be about allowing good developers to develop more advanced software more quickly, but rather about letting poor developers continue to put out just barely suitable software systems.

Permalink: http://pinderkent.phumblog.com/post/2009/03/higherlevel_languages_arent_about_making_experts_more_productive_theyre_about_allowing_average_programmers_to_do_the_otherwise_impossible
Share:

Getting to know today's practical GUI toolkits.

Posted on Tuesday, March 24, 2009 at 1:23 AM.

For years now, Andy Tai has done a great job of maintaining the The GUI Toolkit, Framework Page. It's a very extensive list of GUI toolkits and frameworks, both open source and commercial, for a wide variety of languages and platforms. In fact, it's almost too complete. Many of the toolkits listed are no longer developed or became obsolete years ago. So my aim here is to narrow down his huge list to the toolkits and frameworks that are practical, from a developer's perspective, and worth using today.

First and foremost, we have Qt. Many consider it to be the premiere C++ GUI toolkit. That's not surprising, of course. It has a long history of being developed as a commercial product, first by Trolltech and then by Nokia after they acquired Trolltech. Nevertheless, it has also been released under a variety of open source licenses throughout its lifetime. Given its maturity and use in a wide variety of software systems, including KDE, Opera and Google Earth, it has become known as a very reliable, portable, high-performance and high-quality toolkit.

Although Qt is written in C++, a variety of bindings have been developed for other languages. Some of the most notable include QtAda for Ada 2005, PyQt for Python, PHP-Qt for PHP, QtRuby for Ruby and qtHaskell for Haskell. In short, it proves to be a great toolkit, almost regardless of what language or platform you're using.

After Qt, we can consider wxWidgets to be the next most practical GUI toolkit. Like Qt, it is written in C++, is available under an open source license, is extremely portable, and can allow for the development of a professional-grade UI. Also like Qt, there are bindings for a number of popular languages, including wxPython for Python, wxRuby for Ruby, wxHaskell for Haskell, wxPerl for Perl, and even wxErlang for Erlang and wxLua for Lua.

One of the main benefits of wxWidgets over other GUI toolkits is its use of native controls. This allows applications developed using it to integrate nearly seamlessly with the host operating system, even while remaining somewhat portable. So while other toolkits offer themes that try as best as possible to emulate the behavior and appearance of the native platform's UI toolkit, this is often done imperfectly. A perceptive user will know they're not using a native application. But with wxWidgets, we typically don't find this happening.

After wxWidgets, GTK+ can be considered the next most usable toolkit. One difference that it has from Qt and wxWidgets is that it is written in pure C, rather than C++. While this makes language bindings easier to develop, it does have some drawbacks. One such drawback is that it makes extensive use of the GObject object system to provide object-oriented-like functionality for C. This typically feels inferior to using an actual OO language.

While it is portable to other platforms, its origin as an X Window System toolkit can still be felt. Applications using the Windows port of GTK+, for instance, typically don't truly feel like a native Windows app. Nevertheless, the Windows ports of applications like GIMP and Inkscape are usable and reliable. For the best experience, however, it's usually recommended to use GTK+ applications within an environment such as GNOME or Xfce, both of which are built upon it.

Like wxWidgets and Qt, GTK+ also has a wide variety of language bindings. Some of the most widely used include gtkmm for C++, PyGTK for Python, Gtk2-Perl for Perl, Ruby-GNOME2, PHP-GTK for PHP, Gtk2Hs for Haskell, Gtk# for the languages supported by Mono and .NET, and LablGTK for OCaml. Unlike Qt and wxWidgets, which are quite usable for large applications written in their native language (C++), it's probably best to use GTK+ from one of its more mature bindings, such as gtkmm, PyGTK or Gtk#, rather than from straight C.

Swing is relatively old and well-known. Originally Java-centric, it is becoming a more viable option as languages like Scala and Clojure, which target the JVM, become more prevalent. Other language implementations targeting the JVM, like Jython for Python and JRuby for Ruby, make it even more usable. Unfortunately, it does have a number of problems. It has never been known for offering high performance, and is somewhat memory-intensive. Applications written using Swing never truly feel like native applications, even when using a platform-specific theme. The API itself is also quite messy, having accumulated much cruft after a decade. And many developers don't want the extra baggage associated with the Java runtime, which makes it less appealing for those not writing an application specifically for the Java platform.

The FOX Toolkit is likely the most practical toolkit after Qt, wxWidgets, GTK+ and Swing. Like Qt and wxWidgets, it's written in C++. But unlike them, it doesn't offer as much of an accompanying framework. So in many respects it's a much more lightweight toolkit. And unlike some other toolkits, it doesn't (yet) include support for themes, so it has its own unique look and feel that is reminiscent of the traditional Windows look and feel. Nevertheless, it is portable and released under an open source license.

Unlike the aforementioned toolkits, the FOX Toolkit doesn't have as wide of a variety of language bindings. FXRuby for Ruby is a mature and usable binding, but others, like the FXPy binding for Python and the EiffelFox binding for Eiffel, have stagnated. So the FOX Toolkit is probably best used from its native C++ or from Ruby.

FLTK is similar to the FOX Toolkit in many ways. It's also written in C++, is portable, and is more lightweight than toolkits like Qt and wxWidgets. Unfortunately, it doesn't have as many bindings for other languages, and the ones that do exist (like the Ruby wrapper and pyFLTK for Python) aren't updated very frequently. So while FLTK is usable, it probably isn't the best option for more complex applications that are expected to have a long lifespan.

There are many other GUI toolkits out there, both commercial and open source. In terms of cost, portability, usability, user-experience and programming language interoperability, the toolkits mentioned above are typically the best options. Qt is perhaps the most flexible option for writing high-quality, portable GUI applications, followed closely by wxWidgets. GTK+ is good for software running on UNIX-like systems, but doesn't offer as seamless as an experience on other platforms. Swing is typically the choice for those targeting the JVM. And toolkits like FOX Toolkit and FLTK provide alternatives for those who don't want the baggage of the larger and more complete frameworks. While no toolkit is perfect for every piece of software, picking Qt, wxWidgets, GTK+, Swing, FOX Toolkit or FLTK should prove to be a safe, viable, practical and capable choice.

Permalink: http://pinderkent.phumblog.com/post/2009/03/getting_to_know_todays_practical_gui_toolkits
Share:

Don't forget to peer code review your automated unit tests.

Posted on Sunday, March 22, 2009 at 10:00 PM.

Recently, I've been working with a software development team that makes use of peer code review. They've opted to take a relatively lightweight approach, with one or two of the other developers reviewing each commit. This process has apparently worked quite well for them over several projects now. The main benefit is higher-quality software. Another benefit is that more of the undocumented knowledge about the software is spread among more of the developers. In general, their communication is much better than that of many other development teams I have worked with. I think their peer review process has helped with this.

Although they don't really adhere to the practices of test-driven development, they do also make use of a large number of automated unit tests written using a custom framework they developed in-house. Part of their development process requires that every commit include new or updated unit tests, where necessary. Another requirement of their review process is that all of the unit tests must run successfully before the commit is made, and this must be demonstrated to the reviewer.

Unfortunately, they seemed to put little to no emphasis on reviewing the actual unit test code. The reviewers would typically put much emphasis on reviewing and suggesting changes to the application code, but as long as the unit tests all passed, the code changes there were generally ignored. While ignoring the unit tests did speed up the review process, it left the team vulnerable to bad unit tests.

Early last week, one of their more important customers called in with a problem they were having with the software. For this team, it was actually quite rare for this to happen. So they immediately halted their development efforts, to focus the team on finding and solving this particular customer's problem. After some initial difficulties reproducing it, they soon enough found the case. It ended up being a relatively benign problem, and was quickly fixed.

Unfortunately, several of the developers spent an entire day investigating this problem. For a small company, waste of this sort is very detrimental. But what made it worse, however, is that an automated unit test had been written in the past to detect this very problem. Checking their code repo's logs showed that a critical assertion had been commented out of this unit test during a bug fix. Their review logs showed that the code changes for the fix had been reviewed by a couple of other developers, including the team lead. But they'd neglected to review the unit test code change, and didn't notice there was a problem with the unit test because all of the tests still ran without error.

If they had reviewed the unit test changes as part of the bug fix code review, I have no doubt that they would have caught the issue immediately. Viewing a diff of the changes made it obvious why it happened. The developer had been commenting out certain other assertions and replacing them with new assertions. But apparently this developer had done a search-and-replace on the source code file that ended up accidentally commenting out the unrelated, yet textually similar, assertion.

So I think the lesson we can take from their troubles is that automated unit tests are just as essential as application code. When it comes to peer code reviews, it's often worthwhile to put as much emphasis towards reviewing such unit test code as is put towards the app's code itself. Indeed, their peer code review policy has been updated to include mandatory reviews of any unit test changes. And although this will cause the reviews to take a little bit more time and effort, catching even one incorrectly commented line of unit test code, for instance, could save them many hours of development time, as well as prevent inconvenience to the users of their software. That will likely make it well worth the extra effort.

Permalink: http://pinderkent.phumblog.com/post/2009/03/dont_forget_to_peer_code_review_your_automated_unit_tests
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives