Putting Stack Overflow's hardware usage in perspective.
Posted on Saturday, May 09, 2009 at 2:34 AM.Anand Iyer recently wrote an article that transcribes a portion of a video in which Joel Spolsky is discussing the hardware and software that is backing the very useful and increasingly-popular Stack Overflow Web site. It's mentioned that there is one Web server and one database server, both running on "eight core Xeons" and serving "16 million" pages a month. At first, that sounds impressive. But thinking about it more, I'm not so sure it really is.
First, we should convert that value of 16 million pageviews into something we can comprehend better. Assuming a month of just 30 days, a quick bit of math shows that to be 2,592,000 seconds. Sixteen million pages over that number of seconds ends up being a mere 6.2 pages per second. Now, that's probably not a totally accurate picture. There are no doubt times when the traffic is much higher than that, and other times when it's lower. But even if their overall pageview traffic were to triple or quadruple, we're still not seeing huge numbers of simultaneous page requests.
I don't think people today truly realize the power of today's hardware. Even the low-cost, consumer-grade PCs you can buy for a few hundred dollars are significantly more powerful than the servers of just a few years ago. So I don't think we should get too excited about Stack Overflow being able to serve 6 or 7 pages per second, if not many times that during periods of heavy load, over what's essentially 16 very powerful CPUs.
Thinking back to some of the database-backed intranet Web sites I've worked on in the past, we were able to reasonably handle sustained traffic of 30+ pages per second at times on far inferior hardware. This was even when we still used CGI scripts written in Perl, which have not only the overhead of starting up the interpreter process with each request, but also the overhead of the program interpretation itself.
I recall one job in particular because of how rushed it was. The company had several call centers located throughout the world, and was moving towards a custom Web-based solution for the call center operators to use. Expecting up to 30 simultaneous users per second at peak hours, they had placed an order for some significantly powerful hardware at that time. The order was delayed for some reason, but management wanted the site to go live. So the decision was made to temporarily use some older, unused Sun workstations as servers. I recall spending a night getting two workstations set up as Web servers, and one as a database server, so the system could go live the following morning. It went live, and everyone was very surprised to find that even under higher than expected load, running on older Sun workstation hardware and using Perl CGI scripts, the responsiveness of the Web site was quite acceptable.
Now, I don't expect that to be the case all of the time. Like Joel points out in his talk, there are many sites even today that use significantly more hardware than they probably should. But with some sensible caching policies and a small degree of care while programming, it really wasn't overly difficult to get high-traffic sites running on a small amount of lower-end hardware. Even in Stack Overflow's case, it sounds like they could get by very easily with a small fraction of their current infrastructure. However, it is good to have room to grow, as the traffic to that site likely will.
Computer hardware today is extremely powerful. For most Web sites, even those getting millions upon millions of pageviews per month, scalability just shouldn't be an issue. If it is, it's likely that there have been some pretty significant programming mistakes made when developing the software powering the Web site. And with low-end servers today typically coming with eight or more logical processors and many gigabytes of memory, servicing hundreds of requests per second from a single system should be considered routine.








