Achieving Continuous Integration with Drupal
In the early 1990s, my first job out of college was as a software engineer at a startup company. We were building a commercial product using a well-known open-source network security project. In those days, Agile software development practices (not to mention the World Wide Web, or even widespread public awareness of the Internet) still were in the future. My fellow engineers on that project (who had just graduated with me and to this day are the best programmers I know) and I were taught what we now call the Waterfall method. We thought we were invincible.
We had no idea what was coming. After consultation with potential customers, we wrote a Requirements document describing what the product needed to do, a Functional Specification that described how the product would look and behave, a Design document that described the technical architecture and internals of how we would build it, and even a Test Plan that described the automated tests we would build to ensure the product worked. We had a release deadline, declared by management, of "before Christmas". Good thing we were so young! We engaged in our Death March. The local Chinese delivery place got to know us well. I got home around 1am every morning for months. We finally finished and shipped version 1.0 of the product on December 18. It took me a few weeks to remember what normal humans did when they were not at work.
What Did I Learn from This Experience?
What we did wrong: basically, everything about the software engineering methodology we used was completely stupid. We shipped a working product on time, but we started with the benefit of a working open-source project. We made essentially every mistake that Agile development was invented to prevent.
What we did right: we actually implemented our Test Plan. Since the tests were automated, the build process had to be automated. It certainly added a lot of "extra work" to the project, but the payoff was huge. Before we left for the day, we would kick off the build script. When we came in the next morning, if the last line of output said PASSED, we felt confident and ready to ship. We didn't know it at the time, but we were on the path of what eventually would be called Continuous Integration (CI).
Fast-forward 20 years. I'm now at Acquia, which produces commercial products for companies using the open-source project Drupal. Drupal is a LAMP-stack application for building Web sites and services. We realized early on that everyone using Drupal needs to host it somewhere, and that most people building sites with Drupal do not also want to have to become experts in building a reliable, scalable infrastructure for hosting it. More than that, they also want to be able to follow best practices in software development, testing and deployment; they want to use Continuous Integration. However, they often do not have the time, resources or management support to invest in the necessary infrastructure. I've spent the last three years addressing that problem.
What Is Continuous Integration?
Many excellent and persuasive resources on the Web talk about the principles of CI in detail. In this article, I discuss a simplified list of the most meaningful best practices for Drupal Web site development:
Use a source code repository. This is step zero for good software development. Most people are doing this, using Git, SVN or other systems; if you are not, start now.
Make small, frequent changes. All developers should commit their changes frequently. This reduces the inevitable conflicts and lets problems surface sooner. Also, small, frequent changes enable small, frequent releases, making all the rest of the principles more valuable.
Automate testing. Have your repository automatically integrated with a testing environment, so that every commit triggers a test run. This way, you know immediately if something broke.
Test in a clone of the production environment. It does no good to test your software under different conditions from those that it will run in production; doing so is a recipe for taking down your site when you deploy. Never hear someone say "But it worked on my machine!" again.
Make all versions easily accessible. Despite best efforts, production releases still will break, so you need an easy way to re-deploy a prior version. Then, you'll want to compare the working and broken versions to figure out what went wrong. To do this, you'll need a reference copy of past releases.
Have an audit trail (that is, a blame list). This helps you not just in the source control of who made this commit, but who deployed the commit as well. This can provide rationale as well as potential fixes.
Automate site deployment. In order to tolerate small, frequent releases, pushing a release needs to be an automated process so it's very quick and easy. If it's a big chore to push one release, the whole process falls apart.
Measure results and iterate rapidly. Are the changes helping? Is the site faster? Did the usability enhancement yield more sales? If it's not, you can iterate again.
Achieving Continuous Integration requires some amount of infrastructure, the culture and discipline of the engineering team to use it, and management's understanding and commitment so that it supports the necessary investment. This is an article about technology, not management and culture, so I focus primarily on the infrastructure here.
Building It Yourself
Many shops build their own CI systems that are perfectly tailored to their own needs. Doing so is perfectly reasonable if you have the time and resources to get there. The biggest danger of doing it yourself, of course, is deciding to—and then not getting around to it. You end up doing things the manual, slow and error-prone way "until we have time to fix it", which often turns out to be "never". When you do get started, it probably will end up being a permanent side project, which may lead you to cut corners that will end up causing problems at the worst possible time later.
Here are some of the things you should keep in mind.
Use a source code repository. You probably already are (right?). You will need to be familiar with its "post-commit hook" capability to script actions based on it. If you are using a hosted repository (such as GitHub), you will have to integrate with its Web-based hooks.
Make small, frequent changes. All of your developers will be making frequent commits, resolving conflicts locally as best they can. To keep things moving forward, you need to have a constantly available running copy of everyone's latest code. One way to do this is to deploy the tip of your main development branch automatically to a shared development environment, so everyone always can see it. You can script this yourself using your repo's post-commit hooks. A build automation tool like Jenkins will help, but you still need to write the deployment script yourself.
Automate testing. Assuming you write automated tests for your site, you will want to run them every time someone makes what they believe is a release-ready commit. Lots of tools exist for doing this. One popular choice is Jenkins (formerly called Hudson), and it is excellent. It can integrate directly with your code repository and trigger a "job" on every commit, or run a job on a schedule.
The tests themselves are not the whole story though. Because your application is a Drupal site, you need to test it in a Web environment. You'll certainly need a running database server. If you want to test actual page loads like a browser would see, you'll need a running Web server too. You probably want to test your application along with a reasonably current production database; if you don't automate that, one day you'll find yourself testing against year-old data. However, you also probably want to "scrub" your current production database before running tests against it, lest you accidentally spam all your customers from your test servers, or worse. This is all the responsibility of your test harness script, run by Jenkins.
If you fool yourself that you can "mock out" these dependencies and have purely standalone unit tests that can run anywhere, reality will mock you back. You will discover that tests are not accurately simulating your live environment, and you will have to roll back a release that "passed all of its tests" but failed in production.
Test in a clone of the production environment. This is where things really get interesting. I've already talked about needing a running Web and database server. If your site uses additional services like memcached, Varnish or Apache Solr, you need to make sure those are in place too. If your production site uses SSL, you either need SSL running in your testing environment, or you need to turn off the checks or redirection that enforces it. Ultimately, it is as much work to maintain your test environment as it is your production environment.
Where do you run all this stuff? The "simple" answer is to run it on the same server as Jenkins. However, Jenkins probably is not running on your production servers, so immediately your test environment is different from production. Do you know that when you install Jenkins via your distro's package manager that it does not pull in some other package that your site might end up using in testing but then fail because it is missing in production?
This points to an even deeper issue. You cannot create a clone of your production environment unless you know exactly what your production environment really is. What packages are installed? What configuration files are in place? What dæmons are running? What security updates have been installed? Running a production Web site leads to all kinds of unexpected issues and surprises, and even the best-intentioned, well-meaning sysadmins are likely to solve a crisis by changing some configuration on the server by hand. You have to make sure those changes always get propagated to your test environment. For that matter, you have to make sure they are permanently maintained in your production environment too.
This leads directly to the topic of DevOps and server configuration management. The only way to be sure your production environment is the way you expect is to automate its configuration, and the only way to ensure that your test environment is a clone of your production environment is to use exactly the same automated configuration to build it. There are good open-source tools for doing this; Puppet and Chef are the two I am familiar with. However, Puppet and Chef are programming languages in their own right. Once you go down this path, you are now maintaining two completely different pieces of software: your Web application and the infrastructure automation to run it. At this point, you need to make a recursive call to start reading this article over at the beginning, because you will need to use Continuous Integration on your infrastructure automation just like you do for your Web app. So, your Web app needs a production and test environment, all of which is running in your production infrastructure environment; now you need a test infrastructure environment in which to test updates to your infrastructure code before rolling them out to production. If you are using Jenkins to run your CI process, and Jenkins is deployed as part of the infrastructure you are developing, then...your brain just hit a stack overflow and exploded. Ooops.
To be clear, this is all doable, and there may be simplifying assumptions you can make to reduce the effort. However, if you make the mistake of thinking of your server configuration as something you can just build once and forget about, your Web site is eventually going to suffer for it.
Make all versions easily accessible. When it's time to push to production, you want to create a symbolic tag in your version control system that says what you released when. If you release frequently, you'll end up with a lot of tags, but that's okay; they're cheap. You probably will end up creating these tags in the script you create to automate deployment.
Maintain an audit trail. Your VCS gives you a commit history for your source code, but you need more than that. When something goes wrong, you easily should be able to point to the date/time/individual that played a part and quickly get the information you need. Who pushed the release to production earlier today? Who added a new domain name to Apache virtual host configuration? Can you verify that the SSH key for the employee that left last week has been removed? Most changes will be in your Web site source code, but some will be in your infrastructure configuration code, so you will want a unified view of the changes.
Automate site deployment. Okay, so you are working in small batches with frequent commits, testing every time in a clone of the production environment. Now it needs to be easy to push your new application to the live environment. If you've automated your infrastructure as described and already have a system in place for deploying new code commits to your testing environment, this should be a pretty small additional step. It has to be simple, fast and reliable; you want to be able to push a release and go have lunch five minutes later without worrying about it.
Measure results and iterate rapidly. There are many great monitoring and measurement tools available that check for things, such as error logs, page load performance, server performance, A/B testing and more. Because you've automated your infrastructure configuration, integrating these onto your servers is not that much additional work, but you still have to decide which ones to use, how best to install them, and how to get the data you need out of them most efficiently.
Use an Existing System
Whew! Okay, be honest. How likely is your company actually to make the investment to build and deploy an automated CI infrastructure as described above? I thought so. The fact is that infrastructure is not your specialty. (If it is, can we hire you?) You build exceptional Web sites, and you should not spend so much time and effort also building the servers to run it.
Your alternative is to use a system someone else built for you. Several exist, each with different properties, and with more arriving all the time. I happen to be the lead engineer for Acquia Cloud, so let me quickly demonstrate how Acquia Cloud provides everything you need to implement CI for your Web site.
Figure 1. The Workflow page is the centerpiece of Acquia Cloud's CI system.
Use a source code repository. Acquia Cloud provides both Git and SVN repositories. The URL is displayed right at the top.
Make small, frequent changes. Acquia Cloud provides a development, staging and production environment for your site. You can deploy any branch or tag from your repository in any of them. When you deploy a branch in an environment, every commit to that branch is deployed to that environment. This makes the Dev environment perfect for initial integration testing. Set it to deploy the "master" (Git) or "trunk" (SVN) branch, and every developer's commits are available immediately for initial experimentation.
Automate testing. Every time you deploy code or perform various other actions, Acquia Cloud runs "Cloud Hooks". These are simple scripts that you put into your code repository to perform any actions you want. Each hook is tied to specific actions in a specific environment—for example, all scripts in the hooks/post-code-deploy/prod directory of your repository run when you deploy code to the Production environment. The hook scripts run in sorted order until the first one fails, and the output from all hook scripts is available at the end. This is the perfect way to run your test scripts, scrub a database, perform a load test or anything else.
Test in a clone of the production environment. This is the biggest payoff with Acquia Cloud. We maintain each of these environments—Dev, Stage and Prod—for you. You can choose whether they are on the same or different servers, and whether they are redundant and load balanced or running on a single VM, but we ensure that as far as your Web application is concerned, the configuration is identical. Of course, we also provide 24/7 monitoring, backups, security updates and critical fixes—all the things you would have to do on your servers yourself.
Make all versions easily accessible. As you can see in Figure 2, you can always revert back to any specific tagged version or branch in any environment.
Figure 2. The Code selector lets you deploy any branch or prior release version to any environment.
Have an audit trail (that is, a blame list). Our task log is your audit trail. It shows code commits, but also all changes to your Web environment: domain names, SSH keys, server launches and so on. It shows you exactly what date and time each action took place, with the option to show the full detail for the command.
Figure 3. The Task log shows all changes to any of your site's environments.
Figure 4. Details are available for every task.
Automate site deployment. To deploy a release from one environment to another, simply drag and drop on the UI (or use our API or Drush CLI to do the same thing). If you drag code from an environment deploying a branch, it creates a symbolic tag at the tip of that branch and deploys the tag in the target environment. If you drag an environment deploying a tag, it just deploys the same tag in the same environment. You always can deploy any branch or any previous tag in any environment just by selecting it from the drop-down list (or, again, via our API or CLI).
Figure 5. Deploying code is a simple drag-and-drop operation (API and CLI also available).
Measure results and iterate rapidly. This article mostly been has about Acquia Cloud, but Acquia Cloud is itself just a feature of the Acquia Network that provides a wide variety of tools to improve your site, such as expert Drupal configuration advice, SEO optimization, faceted search, performance monitoring, load testing and spam blocking, plus services like education, training and support. The Acquia Network is part of every Cloud subscription and includes all of these tools, most for free.
Figure 6. The Acquia Network provides resources to understand and improve your site's results.
To view a video overview that showcases how to develop with Acquia Cloud for your Drupal site, I have a Webinar on this very topic available at http://ow.ly/cUNlL. To sign up for a completely free version of Acquia Cloud, visit http://network.acquia.com/freecloud.