Writing Modules for mod_perl
CGI programs are a common, time-tested way to add functionality to a web site. When a user's request is meant for a CGI program, the web server fires up a separate process and invokes the program. Anything sent to the STDOUT file descriptor is sent to the user's browser, and anything sent to STDERR is filed in the web server's error log.
While CGI has been a useful standard for web programming, it leaves much to be desired. In particular, the fact that each invocation of a CGI program requires its own process turns out to be a large performance bottleneck. It also means that if you use a language like Perl where the code is compiled upon invocation, your code will be compiled each time it is invoked.
One way to avoid this sort of problem is by writing your own web server software. Such a project is a significant undertaking, though. While the first web server I used consisted of 20 lines of Perl, most servers must now handle a great many standards and error conditions, in addition to simple requests for documents.
Apache, a highly configurable open-source HTTP server, makes it possible to extend its functionality by writing modules. Indeed, modern versions of Apache depend on modules for most functionality, not just a few add-ons. When you compile and install Apache for your computer system, you can choose which modules you wish to install.
One of these modules is mod_perl, which places an entire Perl binary inside your web server. This allows you to modify Apache's behavior using Perl, rather than C.
Even if you plan to use approximately the same code with mod_perl as you would with CGI, it is useful to know that mod_perl has some built-in smarts that caches compiled Perl code. This gives an extra speed boost, on top of the efficiency gained by avoiding the creation of a child process in which to run the CGI program.
Over the last year, this column has looked at some of the most popular ways of using mod_perl, namely the Apache::Registry and HTML::Embperl modules. The former allows you to run almost all CGI programs untouched, while taking advantage of the various speed advantages built into mod_perl. HTML::Embperl is a template system that allows us to combine HTML and Perl in a single file.
Both Apache::Registry and HTML::Embperl offer a great deal of power and allow programmers to take advantage of some of mod_perl's power and speed. However, using these modules prevents us from having direct access to Apache's guts, turning it into a program that can handle our specific needs better than the generic Apache server.
This month, we will look at how to write modules for mod_perl. As you will see, writing such modules is more complicated than writing CGI programs. However, it is not significantly more complicated and can give you tremendous flexibility and power.
Keep in mind that while CGI programs can be used, often without modification, on a variety of web servers, mod_perl works only with the Apache server. This means that modules written for mod_perl will work on other Apache servers, which constitute more than half of the web servers in the world, but not on other types of servers, be they free or proprietary.
If portability across different servers is a major goal in your organization, think twice before using mod_perl. But if you expect to use Apache for the foreseeable future, I strongly suggest looking into mod_perl. Your programs will run faster and more efficiently, and you will be able to create applications that would be difficult or impossible with CGI alone.
CGI programmers have a limited view of HTTP, the hypertext transfer protocol used for nearly all web communication. Normally, a server receiving a request from an HTTP client (most often a web browser) translates the incoming URL into the local file system, checks to see if the file exists and returns a response code along with the file's contents or an error message, as appropriate. CGI programs are invoked only halfway through this process, after the translation has taken place, the file has been found and a new process fired off.
mod_perl, by contrast, allows you to examine and modify each part of the HTTP transaction, beginning with the client's initial contact through the logging of the transaction on the server's file system. Each HTTP server divides an HTTP transaction into a series of stages; Apache has more than a dozen such stages.
Each stage is known as a “handler” and is given the opportunity to act on the current stage of the HTTP transaction. For example, the TransHandler translates URLs into files on the file system, a LogHandler takes care of logging events to the access and error logs, and a PerlTypeHandler checks and returns the MIME type associated with each document. Additional handlers are called when important events, such as startup, shutdown and restart occur.
Each of these Apache handlers has a mod_perl counterpart, known by the collective name of “Perl*Handlers”. As you can guess from this nickname, each Perl*Handler begins with the word “Perl” and ends with the word “Handler”.
A generic Perl*Handler, known simply as PerlHandler, is also available and is quite similar to CGI programs. If you want to receive a request, perform some calculations and return a result, use PerlHandler. Indeed, most applications that are visible to the end user can be done with PerlHandler. The other Perl*Handlers are more appropriate for changing Apache's behavior from a Perl module, such as when you want to add a new type of access log, alter the authorization mechanism, or add some code at startup or shutdown.
I realize the distinction between Perl*Handlers (meaning all of the possible handlers available to Perl programmers) and PerlHandlers (meaning modules that take advantage of Apache's generic “handler”) can be confusing. Truth be told, confusing the two isn't that big a deal, since the majority of programs are written for PerlHandler and not for any of the other Perl*Handlers.
As I mentioned above, mod_perl caches Perl code, compiles it once, then runs that compiled code during subsequent invocations. This means that, in contrast to CGI programs, changes made in our program will not be reflected immediately on the server. Rather, we must tell Apache to reload our program in some way. The easiest way to do this is to send a HUP signal (killall -1 -v httpd on my Linux box), but there are other ways as well. Another method is to use the Apache::StatINC module, which keeps track of modules' modification dates, loading new versions as necessary.