Writing Modules for mod_perl

Discover the flexibility and power of writing mod_perl modules instead of CGI programs.

The subroutine we just created might seem trivial, but it demonstrates the fact that we can easily modify the behavior of our web server simply by writing a Perl subroutine. Moreover, since subroutines can contain just about any sort of Perl code, we have at our disposal all of the Perl modules, operators, functions and regular expressions that would be available to a stand-alone program.

Indeed, our “handler” routine is simply an entry point to what can be a large, complex program with other subroutines. Since Perl*Handler modules have access to Apache at every stage of operation, we can modify anything using Perl. A growing library of modules that do many common tasks is available, so that you can spend time on the particulars of your problem, rather than reinventing the wheel.

Another Module

Let's write another PerlHandler module, but this time let's have it do something other than return its own output. Just for fun, we will have it turn headlines in a file into Pig Latin. (In Pig Latin, the first letter of each word is moved to the end of the word, and “ay” is tacked on to the end.)

We will call our PerlHandler module Apache::PigLatin, which means we will create a module named PigLatin.pm and put it into the Apache module subdirectory. The source code is shown in Listing 2.

Listing 2

We install our module with a Directory section in httpd.conf:

<Directory /usr/local/apache/share/htdocs/stuff>
SetHandler perl-script
PerlHandler Apache::PigLatin

Make sure the directive points to an actual directory in your Apache document tree.

The module introduces several new ideas, but nothing revolutionary. For starters, we import the constants OK, DECLINED and NOT_FOUND. As we indicated earlier, we will use OK to indicate that our PerlHandler did something, and DECLINED to indicate that Apache should apply some other behavior. We will use DECLINED to ensure our PerlHandler works on HTML-formatted text by checking $r->content_type. If the MIME type is “text/html”, we will operate on the file. If it is a JPEG image, we will refrain from translating it into Pig Latin, returning DECLINED.

Next, we attempt to open the file from $r->filename. This particular module is being used as a simple PerlHandler, so we can be sure the translation from URL to a file name on the file system has been performed. This translation takes place in the TransHandler stage, which we can modify by writing a PerlTransHandler, rather than a simple PerlHandler. While it has translated the URL into a file name on our system, Apache has not checked to see if the file exists—that is our job. If we cannot open the file, we will assume it does not exist, returning the symbol NOT_FOUND.

Now things get interesting: we grab the contents of the file and perform a substitution on headlines—that is, anything between <H\d> and </H\d>, where \d is a built-in character class matching any digit.

We use .*? to match all characters rather than a simple .*, so as to turn off the “greedy” feature in Perl's regular expressions. If we were to say .* rather than .*?, we would match all characters between the first <H\d> and the final </H\d>, rather than between the first pair, the second pair, and so forth. Greediness is usually a good thing when working with regular expressions, but can be frustrating under these circumstances.

We use four options in our substitution, using evaluation (/e), case-insensitivity (/i), global operation (/g) and the . regexp character to match \n (/s). This allows us to perform the substitution in one fell swoop, as well as catch any headlines that might begin on one line and continue on the next one.

Inside the substitution we invoke pl_sent, which is a subroutine defined within our module. This subroutine is not invoked directly from mod_perl, but is there to assist our “handler” routine in doing its work.

What's more, pl_sent invokes another subroutine, piglatin_word, which translates words into Pig Latin. If we were interested in creating a large web application based on mod_perl, you can see how it would be possible to do so, creating a number of subroutines and accessing them from within “handler”. C programmers might think of “handler” as the mod_perl equivalent of “main”, the subroutine invoked by default. Once in that routine, you can do just about anything you wish.

The pl_sent routine is interesting if you have never stacked split, map and join before. We split $sentence into its constituent words across \s+, which represents one or more whitespace characters. We then operate on each element of the resulting list with map, running piglatin_word on each word. Finally, we piece together the sentence in the end, using join to add a single space between each word. The result is returned to the calling s/// operator, which inserts the translated text in between the headline tags.

It is a much tougher problem to handle paragraphs, partly because people often forget to surround paragraphs with <P> and </P>, relying on the fact that browsers will forgive them if they simply say <P>. In addition, paragraphs contain punctuation which makes a good Pig Latin translator harder to write.

There is no limit to the kind of filters you can write. Perhaps the most interesting and advanced are those that use Perl's eval operator to evaluate little pieces of Perl code inside HTML files. A number of these already exist, such as Embperl (discussed several months ago) and EPerl. More simply, you can ensure that every file on your system has a uniform header and footer, removing the need for server-side includes at the top and bottom of each file.