Large-Scale Web Site Infrastructure and Drupal

Setting up a Drupal Web site is pretty simple these days, until it gets popular, then you need to bring out the big guns and start finding and fixing the performance bottlenecks. In this article, we show some of the techniques that can allow your Drupal Web site to scale to the grandiose levels you originally hoped for.
PHP

Static variable caching is a quick-and-easy win in PHP. Here is an example of a simple function with a simple query to the database:

function taxonomy_get_term($tid) {
  return db_fetch_object(
    db_query('SELECT * FROM {term_data} WHERE tid = %d', $tid)
  );
}

This function can be given a simple static variable, so that if this function happens to be called more than once on a page load, it can skip over the call to the database and serve the result out of this static cache:

function taxonomy_get_term($tid) {
  static $terms = array();
  if (!isset($terms[$tid])) {
    $terms[$tid] = db_fetch_object(
      db_query('SELECT * from {term_data} WHERE tid = %d', $tid)
    );
  }
  return $terms[$tid];
}

Application (Drupal)

Drupal is a content management framework that Lullabot uses to build high-performance Web sites on top of this infrastructure. Drupal is built with PHP as its primary programming language and has a ton of user-contributed modules freely available to extend its functionality. It has been compared to LEGOs because of this, and because the quality of modules vary, it's a good idea to do full code reviews of any modules that are selected for inclusion into any platform build. If an existing module already does mostly what is needed, it should be reviewed to make sure static variable caching is utilized, queries are optimized and general coding standards are being used.

Regularly contribute patches back to modules when a module is found lacking in any of these areas or if any general bugs are found through the module's issue queue, which can be found on the same page where you download the module. Performance reviews also are a good idea once a site is built to ensure that queries are optimized and not run more than once per page load. The Devel module is a great resource for this, as it will give you stats on page load times, memory usage and can display every query executed on any given page load.

Beyond the regular LAMP configuration optimizations, caching techniques, and hardware infrastructure are some general Web development best practices available within Drupal that not only can reduce loads on various servers, but also make it easy to have some of your data structures in code that can be version-controlled to keep track of changes and to help with the deployment process of said changes. The first, and relatively new, paradigm of “exportables” is twofold, in that it gives you a way to read a data structure from code instead of the database, and it also can be deployed to different environments and reused.

Exportables started with the Views module by Earl (merlinofchaos) Miles who wanted a way to help debug the problems that his module users might encounter. So, he created a way for users to export the view they created into a readable data structure that he then could put on his own machine to help him debug. This not only had the awesome side effect of being able to share these “view recipes” with other users, but it also evolved into a method where the structure could replace what was read from the database and help increase the performance. Exportables then was extrapolated into a library dubbed Ctools (for Chaos Tools) and used for the Panels module. Other people started catching on and implementing exportables for their modules, and now there are a whole slew of modules that use the Ctools Exportables for this purpose.

This eventually led to a module called Features that provides a UI to choose the various exportable data structures within a Drupal installation and wrap them up into a custom “feature” module, which then can be shared. These features can be simple configuration options or complex features requiring many other contributed modules in order to provide feature-rich enhancements for any Drupal Web site. Not only can it be used to share such features, but it also has become an important part of the deployment process in creating modern Drupal Web sites.

Another tool that has recently matured and become a necessity to any professional Drupalite is Drush. Drush stands for Drupal Shell and is a way to control your Drupal Web sites through the command line. Not only does it provide powerful commands to manipulate your Web site quickly, but other modules can provide integration with Drush as well, creating their own commands related to working with their particular module. For example: the Features module provides commands to Drush that allow you to list, update and revert any feature modules quickly that are part of a Drupal installation's codebase. The Backup and Migrate module provides integration to allow you to create SQL backups of your Web site quickly with a simple command. Some modules even provide commands to work with Drupal and Git! So, not only does Drush allow you to work with your Drupal site quickly, but you also don't have to load a huge page through Apache to do so.

And, of course, no professional Web site would be complete without revision control. Lullabot has used CVS (Concurrent Versions System), SVN (Subversion) and, most recently, made the move to Git. But no matter what you use, it's important to have a backup of your work and versioning for teams working on the same project. The merits of versioning your code are many. Working on a high-performance Web site usually takes many people, so version control becomes a necessity.

Jerad Bitner has been using Drupal since the nightmare upgrades from 4.6 to 4.7 (that's early 2005, if you're asking). He started out as a Technical Illustrator with C/S Group and worked for three years with Photoshop, Illustrator, AutoCad and Macromedia products as well as PHP. When it came time to replicate a platform across the different locations of the company, Jerad found Drupal and hasn't looked back since.

Nate Haug adds a dash of design to Lullabot. He received degrees in both a Fine Arts and Computer Science from Truman State University, creating the perfect bridge between the technical and aesthetic. Detail is his obsession, so if you know what you want, Nate will deliver your desire.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Grammy.com Numbers

Nathan Haug's picture

Well since everyone else is throwing their business pitches in here...

The approach described in this article essentially replicates how Lullabot (the authors of this article) scaled grammy.com to 213 million page views within a single day. Most of those over a 6 hour window during the 52nd awards show. In those same 6 hours, we registered 50,000 new user accounts. Amazingly, we couldn't even measure the full potential of the set up because our hosting providers load-testing cluster couldn't send requests fast enough to bring the site down.

Slides and configuration files of this setup were presented at DrupalCamp Colorado.

or you could simply contact

Vish's picture

or you could simply contact an expert drupal support and Maintenance firm like Halosys technologies.

table locks

dalin's picture

keep in mind that converting this table may cause slow-downs on INSERTs, as InnoDB does a full table lock on INSERTs to avoid key duplication.

For this advice to be applicable, the table would need to be undergoing more writes than reads. How many tables are like this? Not many. Watchdog is the only one that I can think of, and if that is seeing that many writes you have bigger problems.

I instead advise changing _all_ tables to InnoDB. This allows you to tune MySQL only for InnoDB, reducing the MyISAM-only buffers to near-zero (the information_schema and mysql databases still use MyISAM, so you can't completely disable it). This also reduces complexity to only be worried about one engine. The only time this does not apply is when the server has limited RAM, as a well-tunned InnoDB server requires more RAM than a well-tuned MyISAM server.

Drupal can scale to millions of page views a day

2bits.com, Inc.'s picture

There are many ways to scale Drupal.

At 2bits.com, we prefer simpler ways without added complexity both at the code level and the infrastructure level.

Here is a presentation on 3.4 million page views a day, 92 million page views a month, one server and Drupal.

Mercury

Farang's picture

If you are looking for a high performance Drupal setup then you should also look into project Mercury from http://getpantheon.com/

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState