Large-Scale Web Site Infrastructure and Drupal

Setting up a Drupal Web site is pretty simple these days, until it gets popular, then you need to bring out the big guns and start finding and fixing the performance bottlenecks. In this article, we show some of the techniques that can allow your Drupal Web site to scale to the grandiose levels you originally hoped for.

When Twitter experiences an outage, users see the infamous “fail whale” error message, an illustration of twit-birds struggling to hoist a sleeping cartoon whale into the air along with the words “Too many tweets! Please wait a moment and try again.” It happens so often, Twitter has a much-heralded illustration for it. Not too long ago, many readers may remember Facebook going down for days at a time. True, those sites are dealing with extraordinary levels of traffic, but smaller sites often face the same problems. How come? First, Web sites are no longer a collection of static pages. Nowadays, Web sites combine social-networking features with highly customized content for individual users, meaning most pages have to be assembled on the fly. Second, content is changing—rich media, on-line advertising, video, telephony. There's more than text forcing its way through the pipe, and network traffic only continues to grow. Addressing this tandem of complexity and load is the bane of many growing social-media Web sites' existence. What follows are some clever ways to address this whale of a problem.

Surprisingly, the solutions to most scaling problems are frequently the same, regardless of the technology upon which the site was built. Lullabot (the parent company of this article's authors) is a Drupal development company, meaning that most of our experience is centered around the typical LAMP stack (Linux, Apache, MySQL and PHP), although most techniques are universal, and some of the most advanced performance software is platform-neutral.

Server Infrastructure

One of the main factors in scaling a Web site is, of course, the hardware (Figure 1). System administrators always can throw more hardware at a problem and solve it at least temporarily, if they have the resources to do so. Quite a few services can be put in place before this needs to be done, and developers can selectively optimize the application by reducing or optimizing queries. Nevertheless, when it comes to sheer numbers of users and bandwidth over a short amount of time, there almost always comes a point where it's necessary to include hardware in the mix. That's why it is important to have your hardware infrastructure planned in a way that it rapidly can scale upward on a traffic spike, and back down when your traffic recedes.

Figure 1. Hardware Stack

A typical setup, whether virtual or dedicated, usually includes multiple Web servers, multiple database servers and sometimes even separate caching servers, all behind a load balancer that distributes traffic between machines. Depending on its processor speed and the amount of available memory, a Web server or database often can double as the caching server, because caching services usually require less resources than Apache or MySQL.

Although distributing traffic across multiple Web servers, or Web heads, can be a quick win, it can introduce problems with managing file uploads. If requests are being distributed round-robin by the load balancer, a user may upload a file on one server but then be switched to a different Web server after the upload, which doesn't have the newly uploaded file. To solve this problem, a file server also is added into the mix. The file server is usually some form of NAS (Network Attached Storage) or an NFS (Network File System) mount that allows the application to share files between machines. Each Web head will have a copy of the application stored in the Web root, but when it comes to the files that are uploaded or changed often by the users of the application, an NFS mount connects all the servers to a shared file location.

Cache Techniques

The other main factor in scaling a Web site is, of course, the software (Figure 2). To scale effectively, high-traffic Web sites require some flavor or flavors of caching. Caching mechanisms are not mutually exclusive, and most high-profile sites combine several. Most types of caching seek to reduce the amount of disk access necessary to render a page or compile higher-level languages into bytecode so they're faster to run—the closer to machine language the better.

Figure 2. Software Stack

APC (Alternative PHP Cache) and other opcode caches save the Web server from having to read, parse and compile PHP files on every request. APC is a free, open-source opcode cache and is pretty much the standard. It will come built in with PHP 6, but there are many different ones that perform differently.

Modern content management systems, like Drupal, can make a plethora of database calls on every page request. Because calls to the database hit the disk, it is often a bottleneck. Memcached is a service that allows entire database tables to be stored in memory, dramatically speeding up queries to those tables and alleviating strain on the database. It behaves as though it were a giant hash table and serves this data out of memory. Memcached is free, open source and in use by a ton of high-traffic sites. Memcached is installed alongside MySQL on the database server in most typical setups. However, the database server needs to have a lot of RAM available if Memcached and MySQL are sharing this critical resource. There are occasions when Memcached is actually placed on its own server, completely decoupled from the database server, which precludes Memcached from using too much of the database server's memory.

Varnish is an excellent high-performance, HTTP accelerator. The technical term for Varnish is a “reverse proxy cache”, meaning that it handles the requests when you visit a Web site. If Apache were a physician, Varnish would be the triage nurse. After each anonymous page request is made, Varnish makes a copy of the page in an ultra-fast storage so that the next time the page is requested, it returns it immediately, circumventing a bootstrap of Apache, PHP, MySQL, Memcached or any other technologies your Web site may require to serve pages. If Varnish doesn't have a copy of the file or page being requested, it will send the request on to Apache. And, it's really a huge win if you're going to be serving static content.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Grammy.com Numbers

Nathan Haug's picture

Well since everyone else is throwing their business pitches in here...

The approach described in this article essentially replicates how Lullabot (the authors of this article) scaled grammy.com to 213 million page views within a single day. Most of those over a 6 hour window during the 52nd awards show. In those same 6 hours, we registered 50,000 new user accounts. Amazingly, we couldn't even measure the full potential of the set up because our hosting providers load-testing cluster couldn't send requests fast enough to bring the site down.

Slides and configuration files of this setup were presented at DrupalCamp Colorado.

or you could simply contact

Vish's picture

or you could simply contact an expert drupal support and Maintenance firm like Halosys technologies.

table locks

dalin's picture

keep in mind that converting this table may cause slow-downs on INSERTs, as InnoDB does a full table lock on INSERTs to avoid key duplication.

For this advice to be applicable, the table would need to be undergoing more writes than reads. How many tables are like this? Not many. Watchdog is the only one that I can think of, and if that is seeing that many writes you have bigger problems.

I instead advise changing _all_ tables to InnoDB. This allows you to tune MySQL only for InnoDB, reducing the MyISAM-only buffers to near-zero (the information_schema and mysql databases still use MyISAM, so you can't completely disable it). This also reduces complexity to only be worried about one engine. The only time this does not apply is when the server has limited RAM, as a well-tunned InnoDB server requires more RAM than a well-tuned MyISAM server.

Drupal can scale to millions of page views a day

2bits.com, Inc.'s picture

There are many ways to scale Drupal.

At 2bits.com, we prefer simpler ways without added complexity both at the code level and the infrastructure level.

Here is a presentation on 3.4 million page views a day, 92 million page views a month, one server and Drupal.

Mercury

Farang's picture

If you are looking for a high performance Drupal setup then you should also look into project Mercury from http://getpantheon.com/

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix