At the Forge - Profiling Rails Applications
I am writing this article in mid-May 2008, several weeks after Twitter was rumored to be moving to a platform other than Ruby on Rails. Twitter, of course, is an extremely popular service that allows users to write updates and notes about their current status—and allows readers to follow any number of people's Twitter feeds. You can think of Twitter as a combination blogging and RSS platform, populated by people who express themselves with only 140 characters at a time.
Like many other runaway Internet successes, Twitter appears to have become too popular for its own good. This has led to some outages, most notably one at the beginning of 2008 that took more than a day to restore. Thus, it was seen as more than a mere coincidence when Twitter's main architect left the company, and that within a few days, the TechCrunch blog was quoting anonymous officials within Twitter about how the service would be transitioning away from Ruby on Rails.
This was followed by a great deal of discussion over whether Rails is a “scalable” architecture. Scalable used to mean that it was possible to scale up applications using a Web site, almost regardless of how many people are using it. But today, a scalable architecture is one that's lean and mean, handling as many users as possible with as few servers as possible. PHP, Java and .NET are pretty universally considered to be scalable in this sense. Although even the most efficient PHP application can handle only a finite number of simultaneous users, it's undeniable that Ruby is a slower language than PHP, and that the Rails framework adds some more overhead.
Of course, it's one thing to say that Rails doesn't scale as quickly as PHP, and another to say that it doesn't scale at all. And, there are other arguments to be made, including the fact that programmers cost more than servers, and that programmer productivity should be at least as significant a factor as scalability.
That said, it's easy for a Rails application to become slow. So, it is nice to know that a variety of utilities can be used to profile Rails applications—meaning, finding out exactly which portion of the program is taking a long time to execute. This month, we look at some techniques for profiling Rails applications. Although such profiling doesn't make the software run any faster, it can help identify the slowest parts of an application.
If you aren't happy with the performance of your Web site—and quite frankly, you always should be concerned about the performance, trying to give it a boost wherever possible—the first question to ask is, “Where are people spending their time?” After all, if there are 100 different pages on your site, it doesn't really matter whether page 35 is really slow if no one ever visits it.
The first tool to examine, the production log analyzer, is designed to look at the Rails production log and produce some basic statistics about it. The production log, as well as the development and test logs, are typically stored in the log directory under the Rails project root. Thus, on a production server, the log is in log/production.log. This logfile is not rotated or modified automatically; there are clearly a number of ways to do that using cron and other UNIX command-line tools.
The thing is, there already is a facility on UNIX (and Linux) systems for handling logfiles, including their periodic rotation and disposal. This facility is known as syslog, which makes it possible to send logging information to a variety of different files based on priorities and source materials. The /var/log directory on my Ubuntu server is full of different logfiles, and nearly all of them were created and written to by syslog.
It turns out that we can use syslog for our Rails production logs. Once we have done that—and yes, we must use syslog for this to work—we then can analyze our production logs, learning exactly how much time people have spent in our Rails application.
To move your Rails production log to syslog, you need to do several things. First, you must install the Ruby gem that provides this behavior:
gem install --remote SyslogLogger
This installs the gem in the appropriate place on your system; on mine, it was put into /usr/lib/ruby/gems/1.8/gem. Next, you need to add the following to one or more of your environment configuration files (either environment.rb or one or more files in environments/*.rb) for your Rails system:
require 'syslog_logger' RAILS_DEFAULT_LOGGER = SyslogLogger.new
This, of course, loads the syslog_logger gem and sets the default logger to a new instance of SyslogLogger.
Now that you have told Rails to use syslog, you must tell syslog what to do with the files that come from Rails. I opened /etc/syslog.conf and added the following lines to the bottom:
And yes, the documentation system says that you can use a !rails tag before this line, or one like it, to restrict logging to messages coming from Rails. Unfortunately, this syntax does not appear to be supported by Linux. So, this means production.log will include messages from other programs and facilities, not just Rails. That shouldn't concern us right now, although it might be an issue on a busy machine with many services in active use.
Once you have modified syslog.conf in this way, you can restart syslog.conf. Almost immediately, your production log should be stored to /var/log/production.log. You can check this, of course, with the following:
tail -f /var/log/production.log
Now, this logfile is similar in many ways to the logfile you just eliminated from the log directory in your application root. However, it is formatted in such a way that the production log analyzer will be able to find and perform calculations based on its output. To analyze the logfile, type:
If you prefer to have the results sent to you via e-mail, rather than stored to a disk file, use the -e option:
pl_analyze /var/log/production_log -e firstname.lastname@example.org
This option is particularly useful when you invoke pl_analyze from a cron job, for example.
The output file from pl_analyze is divided into three parts:
Time spent in each request.
Time spent in the database for each request.
Time spent rendering the output from each request.
For each controller action, pl_request lists how many times it was invoked, as well as the average time it took to execute. It also gives the min, max and standard deviation, allowing you to see how much the execution time varied over time.
Thus, the production log analyzer shows which actions take the greatest amount of time overall, which take the greatest amount of time in the database (or to render) and how many times each was invoked.
I have found pl_analyzer to be an indispensable tool when trying to determine whether a site is fast enough and where I should focus my attention to improve its speed.
- Linux Kernel Testing and Debugging
- NSA: Linux Journal is an "extremist forum" and its readers get flagged for extra surveillance
- Tails above the Rest, Part III
- Wanted: Your Embedded Linux Projects
- Numerical Python
- RSS Feeds
- Dolphins in the NSA Dragnet
- Are you an extremist?
- Tails above the Rest: the Installation
- The 101 Uses of OpenSSH: Part I