Combining Apache and Perl

This month Mr. Lerner gives us a look at mod-perl, a module for the Apache web server.
Configuring Apache for mod_perl

One of the most popular uses for mod_perl is as a fast replacement for CGI. In order to use it this way, we need to modify Apache's configuration files, so it knows how to handle programs that use mod_perl.

Why must Apache know how to treat these programs? Thinking about CGI programs should make it clear. Browsers request CGI programs in exactly the way they request static documents. The browser does not know whether a given URL points to a program or a static document; that determination is made by the server. If the request is for a static document, the server returns the document verbatim to the user's browser. If the request is for a program, the server executes it and returns any output to the user's browser.

In both of these cases, the browser's behavior is the same: it sends the request to the server and displays the contents of any received response. This places the onus on the server to recognize which files are to be transmitted verbatim, and which are programs whose output will be sent as a response. Apache lets us choose between allowing CGI programs to be located anywhere on the system (as long as they end with an agreed-upon suffix, such as .pl or .cgi) and requiring that they be located in one or more designated directories. This is done using directives in the Apache configuration files.

Now that we have added mod_perl to our server, we must tell Apache how to handle three types of URLs: static documents, CGI programs and mod_perl programs. Adding mod_perl to the mix does not have to change the existing configuration on your system. I created a directory named perl-bin under my web root directory (/home/httpd/perl-bin) and decided all mod_perl programs would reside there, just as all CGI programs reside in cgi-bin. I then added the following lines to my server's srm.conf file:

<Location /perl-bin>
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
</Location>

The <Location> and </Location> tags indicate that we want our settings to take effect for a particular directory, rather than the entire Apache server. Then, we tell Apache to treat documents in the perl-bin directory as Perl scripts, rather than static documents or something else. If you are curious, the Apache manual has an entire section describing handlers, including the AddHandler and SetHandler directives that allow us to configure file types according to location or file extension. Other handlers, for instance, include cgi-script (for CGI programs), server-info (for information about the server) and imap-file (for image maps).

Now that Apache knows which files in /perl-bin should be considered mod_perl programs, we must tell mod_perl how to handle these Perl documents. We will use the Apache::Registry module, which allows us to run CGI programs. Finally, we will use the Options directive to allow CGI programs to be run within this directory.

Finally, we make one last modification to srm.conf, telling mod_perl to produce HTTP headers. We do that outside of the <Location> directive, since we always want mod_perl to return complete headers. The line to add is:

PerlSendHeader On

Adding the PerlSendHeader directive does not relieve us from the responsibility of indicating the type of content we are returning. In other words, we still must add the “Content-type” header to the top of our output, just as we do when writing CGI programs.

Basic Programs with mod_perl

All the pieces are now in place to use mod_perl instead of CGI programs. Let's try a simple program that prints out the current state of the environment. Copy the following into a file called test.pl in the perl-bin directory:

use strict;
print "Content-type: text/html\n\n";
foreach my $key (sort keys %ENV)
{
print "\"$key\" =
\"$ENV{$key}\"<BR>\n";
}

Set permissions so that the file is executable, and ask your browser to retrieve /perl-bin/test.pl. If all goes well, you will see a list of environment variables in your browser.

If you have been writing CGI programs (or using Perl for any length of time), then the above might seem strange. For example, where is the initial line indicating the location of the Perl interpreter, as well as its switches? The initial hash-bang (#!) syntax which we are so accustomed to is missing because it's unnecessary. That two-character code tells the UNIX shell that it shouldn't try to interpret a program (i.e., as a shell script), but rather that it should give the responsibility to another program. That's why Perl programs usually begin with the line:

#!/usr/bin/perl

while Tcl programs begin with:

#!/usr/bin/tclsh
and so forth. Because our program is run by mod_perl and mod_perl understands Perl programs, we don't need the hash-bang syntax at the top of our program.

Command-line switches raise a more subtle issue, one that cuts to the heart of mod_perl's advantages over standard CGI programs. Programs run much faster under mod_perl for several reasons, but the two primary ones are that Perl is embedded in Apache (saving the overhead of starting Perl with each invocation), and programs are compiled once, then cached (saving the overhead of compilation with each invocation). The combination of embedding Perl within Apache and caching compiled programs can mean a tremendous boost in execution speed, often ranging from 400 percent to 2000 percent.

There are tradeoffs for these increases in speed, and one of them is that command-line switches no longer work as expected. Switches are handled at compilation time, so if you expect switches to work each time your program is run, you will be disappointed. However, all is not lost. Programmers interested in turning on Perl's warnings (the -w flag) and security checks (the -T flag, for tainting) from within mod_perl programs can do so with a directive inside of the srm.conf file. To turn on warnings, you simply add the line:

PerlWarn On

This has the effect of turning on warnings from within your programs. As usual, warning messages are sent to the Apache error log.

By the same token, you can activate Perl's security checks (commonly known as “tainting”) by adding the PerlTaintCheck directive inside of srm.conf:

PerlTaintCheck On

When you write CGI programs (or any other programs, for that matter) in Perl, it is usually a good idea to include the use strict directive, as we saw in the above example. When programming with mod_perl, however, it is extremely important to use strict. Otherwise, variable definitions may remain in memory after your program exits, creating problems for future invocations of this or other programs.

By the same token, do not use the exit function to leave your program prematurely. Normally, calling exit from within a CGI program will end the program—not a bad thing, if it has already produced all of its output. If you call exit from within a mod_perl program, the program takes Perl along with it; and since Perl is embedded within the copy of Apache, killing Perl effectively kills that particular server process as well. If you absolutely must call exit from within your program, use Apache::exit instead; it will do what you want without unexpected side effects.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Getting external links to open in new windows

Anonymous's picture

nice topic!

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix