Pinderkent

Pain and glory from the trenches of the IT world.

CGI scripts are often a perfectly fine approach.

Posted on Monday, May 24, 2010 at 1:21 PM.

Today I noticed a submission at reddit about modern Web development. Much of the discussion there currently centers around technologies like PHP, Ruby on Rails, and Django. One commenter, however, brought up the acceptability of CGI scripts. As is usually expected when the topic of CGI scripts comes up, somebody replied and mentioned how they have "terrible overhead".

In some cases, this is absolutely true. If you have a site getting substantial traffic, or your site could experience an unexpected spike in traffic, then using CGI scripts clearly isn't a viable option. However, very few sites fall into this category. With the vast majority of sites online getting less than one hit per second, CGI scripts can actually prove to be a very versatile technology. The development overhead is very minimal compared to other techniques, especially those involving complex frameworks. Just about any programming language can be used to write a CGI script. There is much flexibility when it comes to the implementation, as no templating engines or O/R mappers must be used. Almost all modern web servers have excellent support for CGI scripts, and they're very easy to deploy. And if a performance boost is ever needed, it's almost trivial to convert a CGI script to use FastCGI.

A lot of developers today don't truly understand the power of modern server hardware and software. They don't realize how insignificant it is to start up a new process. Indeed, when it comes to CGI scripts, the process start-up time is often extremely negligible compared to the time it takes to perform database queries, for example. This is even true for interpreted scripting languages. When you factor in the caching done by virtually all server-grade operating systems today, as well as the caching of bytecode by an interpreted scripting language like Python, process start-up time becomes a non-issue.

Something else to keep in mind is that, well over a decade ago, we used CGI scripts alone to power sites that even today would be considered as being high-traffic sites. This was done on hardware that's a mere fraction the power of the hardware we have available to us today. Now, it wasn't unusual to see such servers become saturated with requests, and have an extremely high load, so technologies like NSAPI and Apache modules were eventually developed to combat this. Nevertheless, many sites were unable to make use of those approaches, so CGI scripts still remained widely used, and help create what became today's Web.

Contrary to what some people misleadingly suggest, CGI scripts are still a viable, acceptable and even optimal approach for many dynamic Web sites today. They provide a high degree of flexibility when it comes to the programming language used, any templating engine that may be used, any ORM system that may be used, the web server software used, the operating systems they run on, and so forth. To immediately write-off CGI scripts due to misconceptions about process start-up overhead is absurd. In reality, CGI scripts are a very acceptable approach for most Web sites today, and no doubt should be considered as an option.

Permalink: http://pinderkent.phumblog.com/post/2010/05/cgi_scripts_are_often_a_perfectly_fine_approach
Share:

Parrot just can't compete with LLVM, the JVM, and the .NET CLR.

Posted on Sunday, May 23, 2010 at 11:51 AM.

I read an article today, written by Andrew Whitworth, that discusses Parrot and its fitness as a target platform. His article, along with other recent developments, may very well answer the question I asked nearly three years ago, Will Parrot ever truly deliver? Unfortunately, the answer appears to be a resounding No.

For those who might not be aware, Parrot is, according to their web site, a "virtual machine designed to efficiently compile and execute bytecode for dynamic languages." Although it has been in development for about a decade, there has been comparatively little to show for all of the effort that has gone into it. Sure, there have been frequent releases, but in the end we still don't have a platform that garners much attention, and we still don't see anyone really putting forth a lot of effort to target it.

Andrew's article helps highlight why both language implementors and users may be hesitant to spend time targeting Parrot. Towards the end of his article, he covers parts of the system that he thinks will be seeing major changes within the next few months. Throughout these seven points, we see some very unsettling things. The very first point, for instance, mentions that, "GC is a very internal thing, when it works properly, you don't even need to know it exists." Now, garbage collection isn't a trivial task, but it has been very well studied and implemented many times over in real-world systems. Although we can't expect any such system to be perfect, it is unsettling when we read statements like "when it works properly" regarding a ten-year-old virtual machine platform. There just shouldn't be so much doubt about such a fundamental part of a virtual machine.

The second point is no better. Just-in-time compilation, like garbage collection, is another one of those cornerstones of a VM that we should expect to be mature and robust after 10 years. It's very worrisome to read that Parrot is lacking so badly in this area, even after two major releases.

The third point is perhaps the worse of all. In it, he states, "We don't really have a good, working, reliable threads implementation now and HLLs are generally not using them." It's currently 2010, and the situation today is that almost all new desktop PCs, and even many notebooks and netbooks, have a CPU with at least two cores. Most server-grade computer systems offer several times that, with multiple CPUs, with multiple cores per CPU, and even multiple threads of simultaneous execution per core. Efficiently using these systems to their full potential currently means writing multithreaded software. Like garbage collection and just-in-time compilation, threading has been well-studied, implemented repeatedly, and is one of the major pieces of any virtual machine platform. There's just no excuse for Parrot not to have better multithreading support.

The fifth point is pretty serious, as well. It discusses packfiles, which are the files that contain Parrot bytecode, debug data, and so forth. This is one more essential part of any VM implementation that should be very mature after a decade's worth of development. It's disappointing to hear that there are still portability issues with these files after so many years.

After reading about those rather serious deficiencies, I have a hard time understanding how Andrew can suggest that, "In summary, Parrot is a good, stable platform for HLL developers to use." From what I can see, Parrot is a platform that has had a lot of time and opportunity to make something of itself, but due to various problems, from internal developer strife, to a bad reputation, to a lack of serious users, it just hasn't matured.

Since I wrote my other article about Parrot almost three years ago, we've seen major developments out of the other major VM providers. We're seeing the Java platform get better support for dynamic languages in the upcoming JDK 7 release. We've also seen Microsoft's Dynamic Language Runtime become available for their .NET platform, allowing for mature and usable language implementations like IronRuby and IronPython to be developed.

Perhaps the biggest threat of all to Parrot is LLVM. LLVM has become widely accepted by industry, and even significant open source projects like FreeBSD are integrating and supporting it. In addition to having excellent support for C, C++ and Objective-C, we're even seeing it used as the back-end for dynamic programming language implementations. Rubinius and MacRuby are two examples of Ruby implementations that support LLVM. Then there are Python implementations like Unladen Swallow and PyPy.

I just don't think that Parrot can compete with these other platforms. Parrot has spun its wheels for far too long, and just isn't as mature as the JVM, the .NET CLR, or LLVM have become. Aside from casual or hobby development, I don't see why anyone would develop a software system specifically targeting Parrot. Its future seems extremely bleak at this point.

Permalink: http://pinderkent.phumblog.com/post/2010/05/parrot_just_cant_compete_with_llvm_the_jvm_and_the_net_clr
Share:

NoSQL, the next big mess we'll get to clean up.

Posted on Sunday, April 04, 2010 at 9:56 PM.

Over the past couple of years, we've been hearing more and more about the so-called "NoSQL" movement. In short, its adherents advocate the use of various data management systems that do away with some of the features that most relational database systems have come to offer, in favor of supposedly offering better performance for large data sets. The hype has become particularly strong lately, with it being revealed less than a month ago that Digg has started using Cassandra heavily. A few days later, a similar announcement was made regarding reddit.

There has been a fair amount of discussion regarding this topic. Dennis Forbes, for instance, discusses Digg's transition, and explains how properly using some of the most integral features of virtually all existing relational database systems, along with solid-state drives, can help alleviate many performance issues. We've also seen Ted Dziuba write about the risk and unnecessity of NoSQL-esque approaches for most situations, while Royans Tharakan has suggested the opposite to be true. Jeremy Zawodny describes NoSQL as "software Darwinism".

Regardless of how one personally feels about NoSQL or relational database systems, I can think of a few things that will likely hold true:

  1. NoSQL techniques and systems will continue to get the sort of hype that misleads many developers and managers into thinking it's an approach that's much better than it actually is.
  2. Numerous existing software systems currently using relational databases very successfully will be transitioned to using using a NoSQL approach.
  3. Many new software systems will use NoSQL technologies, especially when it isn't necessary or even suitable to do so.
  4. These new and modified systems will fail horribly. Expected performance gains won't materialize, data will be lost or badly mangled due to NoSQL's general lack of focus on data consistency, and codebases will be ruined by these transitions.

This is a mixed blessing. On one hand, it will ensure a lot of work for those of us who often get called in to deal with software blunders. But this isn't truly productive work, of course. It's mainly just fixing mistakes that shouldn't have been made in the first place. Techniques and software that worked for NoSQL users like Facebook, Google, Digg or reddit just won't work across the board, and it's quite unfortunate that so many developers and development managers won't realize this until it's far too late.

Permalink: http://pinderkent.phumblog.com/post/2010/04/nosql_the_next_big_mess_well_get_to_clean_up
Share:

Functional programming JavaScript is a dead-end exercise.

Posted on Saturday, October 31, 2009 at 4:30 PM.

Yesterday a colleague forwarded me the link to Underscore.js. It's a JavaScript library that provides some functions commonly offered by functional programming language implementations.

Now, I can understand completely why JavaScript programmers would desire to use such techniques. They bring some very clear and powerful benefits, including shortened development time, fewer lines of code, greater flexibility, and improved readability. However, I do hope that JavaScript programmers using such a library don't come to think that they're actually doing functional programming.

One of the most significant areas where JavaScript fails with respect to functional programming is immutability. Current implementations have spotty support for constants, leading to various workarounds. So while it's possible to manually avoid state changes as much as is possible, it becomes difficult to do when performing browser-based JavaScript development. And it's almost always better to have the language implementation strictly enforce const-ness, rather than trying to have developers communicate this intent through all-caps variable names, for instance.

Many functional languages also offer very rich pattern matching functionality, which we just don't get with JavaScript. While some people have tried to implement pattern matching in JavaScript, it is nowhere near as clean or natural as pattern matching in Haskell or pattern matching in Erlang.

Many, but not all, functional programming languages also offer strict, static typing. There has been much debate about the pros and cons of the various typing techniques employed by various languages. In the end, however, experience shows that static typing results in higher-quality software, and static typing saves developer time. Unfortunately, JavaScript as a language, as well as its current widely-used implementations, don't lend themselves to strong, static typing.

It also doesn't help that JavaScript came mainly from the imperative and prototype-based OO world, and is only now trying to adopt features and techniques from the functional paradigm. It's often much cleaner to start with a purely functional language, and then add useful imperative features like for loops and references, as was done with Objective Caml. So the language and standard libraries have a functional feel to them, and naturally encourage the use of functional techniques, yet still allow the use of imperative features where they may prove to be the best option.

In many respects, I see trying to do functional programming in JavaScript much like doing object-oriented programming in C. While it can be done, to some extent, it never feels very natural because it lacks support that should be provided at the language level and by the implementations of said language. GObject is one of the most widely used C object systems, and as anyone who has written even a moderately sized GTK+-based system in C knows, it's not a very pleasant ordeal. Languages like Objective-C and C++ offer a much more developer-friendly experience.

So I see trying to add features and techniques from functional programming to JavaScript as generally being a pointless exercise. It may help in some cases, but ends up just masking the symptoms of a greater problem, namely that we want (and maybe even need) a true functional programming language available in all of the popular Web browsers. I've suggested Haskell in the past, but just about any functional language would be better than JavaScript.

Permalink: http://pinderkent.phumblog.com/post/2009/10/functional_programming_javascript_is_a_deadend_exercise
Share:

Analyzing existing databases and their relationships with applications.

Posted on Saturday, October 17, 2009 at 3:24 AM.

Anybody working on business applications these days will undoubtedly have to familiarize himself or herself with one or more existing databases. These databases have often been "grown" rather than designed in any meaningful way, and thus will be littered with unused tables, unused functions or stored procedures, missing constraints, poor normalization, and a host of other problems.

Developers in such a situation will often look for an easy way out, such as the use of tools to automatically reverse-engineer various parts of the existing database or databases. While these tools can be useful, I don't think they are ever a replacement for just stepping through the code, line-by-line, and observing exactly what queries are executed.

Depending on the application, it may even be a bad idea to try and think of the application and database as separate. Many times we find that one can't exist without the other, and vice versa. For instance, we find hard-coded SQL queries within the application code written in languages like Java, PHP or C#. In such situations, one literally has to take a debugger and step through the application code in order to get even a basic understanding of how the system works.

Another thing that may be worth avoiding is trying to understand the system all at once. Often, it will take many months to truly grasp even a moderately sized application and the database behind it. As changes are made or bugs are fixed, take a moment or two to study and document the code paths that are involved. Doing this on a daily basis will eventually expose a developer to large portions of the software system they're working with.

Regardless of the approach, one thing to keep in mind that it's not an easy task becoming familiar with existing codebases and databases, especially when they're as ugly as so many real-world systems are. Give it time, remain patient, and eventually the system will start to feel much smaller and manageable.

Permalink: http://pinderkent.phumblog.com/post/2009/10/analyzing_existing_databases_and_their_relationships_with_applications
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives