Pinderkent

Pain and glory from the trenches of the IT world.

CGI scripts are often a perfectly fine approach.

Posted on Monday, May 24, 2010 at 1:21 PM.

Today I noticed a submission at reddit about modern Web development. Much of the discussion there currently centers around technologies like PHP, Ruby on Rails, and Django. One commenter, however, brought up the acceptability of CGI scripts. As is usually expected when the topic of CGI scripts comes up, somebody replied and mentioned how they have "terrible overhead".

In some cases, this is absolutely true. If you have a site getting substantial traffic, or your site could experience an unexpected spike in traffic, then using CGI scripts clearly isn't a viable option. However, very few sites fall into this category. With the vast majority of sites online getting less than one hit per second, CGI scripts can actually prove to be a very versatile technology. The development overhead is very minimal compared to other techniques, especially those involving complex frameworks. Just about any programming language can be used to write a CGI script. There is much flexibility when it comes to the implementation, as no templating engines or O/R mappers must be used. Almost all modern web servers have excellent support for CGI scripts, and they're very easy to deploy. And if a performance boost is ever needed, it's almost trivial to convert a CGI script to use FastCGI.

A lot of developers today don't truly understand the power of modern server hardware and software. They don't realize how insignificant it is to start up a new process. Indeed, when it comes to CGI scripts, the process start-up time is often extremely negligible compared to the time it takes to perform database queries, for example. This is even true for interpreted scripting languages. When you factor in the caching done by virtually all server-grade operating systems today, as well as the caching of bytecode by an interpreted scripting language like Python, process start-up time becomes a non-issue.

Something else to keep in mind is that, well over a decade ago, we used CGI scripts alone to power sites that even today would be considered as being high-traffic sites. This was done on hardware that's a mere fraction the power of the hardware we have available to us today. Now, it wasn't unusual to see such servers become saturated with requests, and have an extremely high load, so technologies like NSAPI and Apache modules were eventually developed to combat this. Nevertheless, many sites were unable to make use of those approaches, so CGI scripts still remained widely used, and help create what became today's Web.

Contrary to what some people misleadingly suggest, CGI scripts are still a viable, acceptable and even optimal approach for many dynamic Web sites today. They provide a high degree of flexibility when it comes to the programming language used, any templating engine that may be used, any ORM system that may be used, the web server software used, the operating systems they run on, and so forth. To immediately write-off CGI scripts due to misconceptions about process start-up overhead is absurd. In reality, CGI scripts are a very acceptable approach for most Web sites today, and no doubt should be considered as an option.

Permalink: http://pinderkent.phumblog.com/post/2010/05/cgi_scripts_are_often_a_perfectly_fine_approach
Share:

CSS has failed for both casual users and power users.

Posted on Saturday, August 22, 2009 at 2:26 AM.

Guido van Rossum, of Python fame, recently twittered about how using CSS instead of tables for Web page layout just isn't worth it. This was submitted to Reddit, and the discussion it generated there is somewhat predictable. Many of the comments get into the typical arguments surrounding the topics of HTML tables versus CSS for layout, and the necessity of separating content from presentation. But I think most of the comments that are currently there miss some key issues.

First of all, a number of the posters point out that once you learn the many quirks and incompatibilities of the various popular Web browsers, CSS-based layout becomes quite natural. To a great extent, this is true. There are many Web designers out there who can make very good looking sites using CSS. Unfortunately, it takes literally years of effort, learning, experimentation and failure to get to this point. And with browser technology continually evolving, it takes further effort just to keep pace.

Now, some people will argue that any inherently complex task will take much time and effort to master, and they're correct. But in this case, CSS-based layout shouldn't be such a task. Reality, however, shows that it is, mainly due to artificial difficulties created by low-quality, inconsistent and obsolete-yet-widely-used Web browsers.

Another thing to consider is that not everybody wants to make complex Web layouts. Many people who aren't Web designers want to quickly throw together a site that has a relatively simple layout, and looks decent. On one hand, they can battle with the many troubles that CSS brings to inexperienced users. On the other, they can just use HTML tables, which for many simple layout tasks end up being much more practical and efficient to work with.

CSS should be able to cater to people in both camps, namely those professionals who want to develop complex pages with a high degree of control, and those who just want to throw together a page quickly and easily. Unfortunately, it fails both groups of people a lot of the time. The trend seems to be that most professional designers struggle with it until they finally learn how to wrangle it, most of the time. By that point they've invested so much time and effort that the only way they can obtain some degree of payback is to employ their hard-earned "knowledge", which itself is more an understanding of numerous broken and poorly-implemented Web browsers than anything else. And those people who deem their time better spent on other tasks, like Guido, apparently, just resort to HTML tables.

The fact that the CSS versus HTML tables debate has raged for so long should suggest that CSS is a dead end. It doesn't do a sufficient job in fixing the huge variety of problems associated with what should otherwise be a straightforward task. Perhaps Internet Explorer 8's better support for the CSS table model will help improve the situation. Then again, it may just make things worse. Perhaps the only solution is to throw out the sub par technologies that we employ now, and find a better way to solve the problems of Webpage layout.

Permalink: http://pinderkent.phumblog.com/post/2009/08/css_has_failed_for_both_casual_users_and_power_users
Share:

Microsoft has raised some valid points about HTML 5.

Posted on Saturday, August 08, 2009 at 4:30 PM.

Recently, Adrian Bateman of Microsoft raised some questions, concerns and thoughts about HTML 5. Although Microsoft doesn't have the best reputation for supporting and complying with standards, especially when it comes to Web technologies, we shouldn't use those feelings as an excuse to ignore these questions. Had some other non-Microsoft individual or organization made these same remarks, they'd be just as valid and just as worthy of some serious consideration.

A number of questions are raised about the necessity of a number of the new elements. Indeed, many of the new section elements of HTML 5 do seem quite unnecessary. The use of <div> or <span> tags with an associated class should be more than sufficient replacements.

The same goes for the proposed <dialog> element, which is meant for representing a back-and-forth conversation between parties. Aside from the lack of necessity for this element, I personally don't like its name very much. Within the field of software development, the term "dialog" is often used to refer to dialog boxes, which is the first thing that comes to my mind when I see that tag. However, dialog boxes and the <dialog> tag are clearly two very different concepts.

With respect to some of the new tags relating to time and date handling, I have to agree with Adrian's description of such handling as "notoriously complex". Earlier this year I wrote about how care is needed when implementing time and date handling. But now HTML 5 seems to be opening this can of worms with its new <time> tag, and its <input> tag changes. I hope these new elements don't just introduce more problems than they solve.

The <bb> element sounds quite questionable. Aside from the completely non-descriptive tag name, the security implications of this element are obvious. Thankfully, both Microsoft and Mozilla seem aware of the potential dangers of this element.

The <progress> and <meter> elements theoretically sound useful, but I suspect that in reality, they just wouldn't be flexible enough for most Web developers. If their appearance couldn't be heavily modified, they'd likely just be passed over in favor of existing image-based approaches.

HTML 5 has always felt like a hodge-podge of different ideas from various groups, thrown together and called a "standard". It's good to see some realistic, solid criticism from one of the major Web browser developers. Hopefully their input will help HTML 5 resolve some of these issues surrounding unnecessary, impractical and potentially dangerous elements and changes.

Permalink: http://pinderkent.phumblog.com/post/2009/08/microsoft_has_raised_some_valid_points_about_html_5
Share:

Please stop asking me to take your Web site improvement surveys.

Posted on Monday, May 18, 2009 at 2:27 PM.

One thing I've noticed getting more and more common is the use of survey hovers on the Web sites of a variety of companies. A typical scenario involves me going to their site to read up on one or more of their products, only to encounter a hover popup requesting me to take some survey, usually about the Web site itself. A good example of this is on Intel's Web site:

Survey hover screenshot.

These survey hovers are too intrusive, especially on commercial sites. When I'm focused on finding the best product to buy, I want to be seeing product specifications and prices. I don't want to be distracted with survey participation requests.

Now, I could always take the survey, and hope there's some area where I can add my own comments and explain my annoyance with the survey popups. But somehow I think that my participation in the survey would be misconstrued to mean that the survey hovers are in fact working, and getting people to take the survey, ignoring the fact that the user suggestion is to drop the survey hovers. So I'll write about it here instead, and hope that some marketing folks see this posting.

I've got nothing against the surveys themselves, and can understand the need for customer feedback. I just really dislike the in-your-face approach of these hovers. From my perspective, they do more harm than whatever good they might bring. When on a commercial Web site, these hovers distract me (and probably others, as well) from focusing on the company's products, which can negatively affect my purchase of said products.

Permalink: http://pinderkent.phumblog.com/post/2009/05/please_stop_asking_me_to_take_your_web_site_improvement_surveys
Share:

Putting Stack Overflow's hardware usage in perspective.

Posted on Saturday, May 09, 2009 at 2:34 AM.

Anand Iyer recently wrote an article that transcribes a portion of a video in which Joel Spolsky is discussing the hardware and software that is backing the very useful and increasingly-popular Stack Overflow Web site. It's mentioned that there is one Web server and one database server, both running on "eight core Xeons" and serving "16 million" pages a month. At first, that sounds impressive. But thinking about it more, I'm not so sure it really is.

First, we should convert that value of 16 million pageviews into something we can comprehend better. Assuming a month of just 30 days, a quick bit of math shows that to be 2,592,000 seconds. Sixteen million pages over that number of seconds ends up being a mere 6.2 pages per second. Now, that's probably not a totally accurate picture. There are no doubt times when the traffic is much higher than that, and other times when it's lower. But even if their overall pageview traffic were to triple or quadruple, we're still not seeing huge numbers of simultaneous page requests.

I don't think people today truly realize the power of today's hardware. Even the low-cost, consumer-grade PCs you can buy for a few hundred dollars are significantly more powerful than the servers of just a few years ago. So I don't think we should get too excited about Stack Overflow being able to serve 6 or 7 pages per second, if not many times that during periods of heavy load, over what's essentially 16 very powerful CPUs.

Thinking back to some of the database-backed intranet Web sites I've worked on in the past, we were able to reasonably handle sustained traffic of 30+ pages per second at times on far inferior hardware. This was even when we still used CGI scripts written in Perl, which have not only the overhead of starting up the interpreter process with each request, but also the overhead of the program interpretation itself.

I recall one job in particular because of how rushed it was. The company had several call centers located throughout the world, and was moving towards a custom Web-based solution for the call center operators to use. Expecting up to 30 simultaneous users per second at peak hours, they had placed an order for some significantly powerful hardware at that time. The order was delayed for some reason, but management wanted the site to go live. So the decision was made to temporarily use some older, unused Sun workstations as servers. I recall spending a night getting two workstations set up as Web servers, and one as a database server, so the system could go live the following morning. It went live, and everyone was very surprised to find that even under higher than expected load, running on older Sun workstation hardware and using Perl CGI scripts, the responsiveness of the Web site was quite acceptable.

Now, I don't expect that to be the case all of the time. Like Joel points out in his talk, there are many sites even today that use significantly more hardware than they probably should. But with some sensible caching policies and a small degree of care while programming, it really wasn't overly difficult to get high-traffic sites running on a small amount of lower-end hardware. Even in Stack Overflow's case, it sounds like they could get by very easily with a small fraction of their current infrastructure. However, it is good to have room to grow, as the traffic to that site likely will.

Computer hardware today is extremely powerful. For most Web sites, even those getting millions upon millions of pageviews per month, scalability just shouldn't be an issue. If it is, it's likely that there have been some pretty significant programming mistakes made when developing the software powering the Web site. And with low-end servers today typically coming with eight or more logical processors and many gigabytes of memory, servicing hundreds of requests per second from a single system should be considered routine.

Permalink: http://pinderkent.phumblog.com/post/2009/05/putting_stack_overflows_hardware_usage_in_perspective
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives