Writing Modules for mod_perl
CGI programs are a common, time-tested way to add functionality to a web site. When a user's request is meant for a CGI program, the web server fires up a separate process and invokes the program. Anything sent to the STDOUT file descriptor is sent to the user's browser, and anything sent to STDERR is filed in the web server's error log.
While CGI has been a useful standard for web programming, it leaves much to be desired. In particular, the fact that each invocation of a CGI program requires its own process turns out to be a large performance bottleneck. It also means that if you use a language like Perl where the code is compiled upon invocation, your code will be compiled each time it is invoked.
One way to avoid this sort of problem is by writing your own web server software. Such a project is a significant undertaking, though. While the first web server I used consisted of 20 lines of Perl, most servers must now handle a great many standards and error conditions, in addition to simple requests for documents.
Apache, a highly configurable open-source HTTP server, makes it possible to extend its functionality by writing modules. Indeed, modern versions of Apache depend on modules for most functionality, not just a few add-ons. When you compile and install Apache for your computer system, you can choose which modules you wish to install.
One of these modules is mod_perl, which places an entire Perl binary inside your web server. This allows you to modify Apache's behavior using Perl, rather than C.
Even if you plan to use approximately the same code with mod_perl as you would with CGI, it is useful to know that mod_perl has some built-in smarts that caches compiled Perl code. This gives an extra speed boost, on top of the efficiency gained by avoiding the creation of a child process in which to run the CGI program.
Over the last year, this column has looked at some of the most popular ways of using mod_perl, namely the Apache::Registry and HTML::Embperl modules. The former allows you to run almost all CGI programs untouched, while taking advantage of the various speed advantages built into mod_perl. HTML::Embperl is a template system that allows us to combine HTML and Perl in a single file.
Both Apache::Registry and HTML::Embperl offer a great deal of power and allow programmers to take advantage of some of mod_perl's power and speed. However, using these modules prevents us from having direct access to Apache's guts, turning it into a program that can handle our specific needs better than the generic Apache server.
This month, we will look at how to write modules for mod_perl. As you will see, writing such modules is more complicated than writing CGI programs. However, it is not significantly more complicated and can give you tremendous flexibility and power.
Keep in mind that while CGI programs can be used, often without modification, on a variety of web servers, mod_perl works only with the Apache server. This means that modules written for mod_perl will work on other Apache servers, which constitute more than half of the web servers in the world, but not on other types of servers, be they free or proprietary.
If portability across different servers is a major goal in your organization, think twice before using mod_perl. But if you expect to use Apache for the foreseeable future, I strongly suggest looking into mod_perl. Your programs will run faster and more efficiently, and you will be able to create applications that would be difficult or impossible with CGI alone.
CGI programmers have a limited view of HTTP, the hypertext transfer protocol used for nearly all web communication. Normally, a server receiving a request from an HTTP client (most often a web browser) translates the incoming URL into the local file system, checks to see if the file exists and returns a response code along with the file's contents or an error message, as appropriate. CGI programs are invoked only halfway through this process, after the translation has taken place, the file has been found and a new process fired off.
mod_perl, by contrast, allows you to examine and modify each part of the HTTP transaction, beginning with the client's initial contact through the logging of the transaction on the server's file system. Each HTTP server divides an HTTP transaction into a series of stages; Apache has more than a dozen such stages.
Each stage is known as a “handler” and is given the opportunity to act on the current stage of the HTTP transaction. For example, the TransHandler translates URLs into files on the file system, a LogHandler takes care of logging events to the access and error logs, and a PerlTypeHandler checks and returns the MIME type associated with each document. Additional handlers are called when important events, such as startup, shutdown and restart occur.
Each of these Apache handlers has a mod_perl counterpart, known by the collective name of “Perl*Handlers”. As you can guess from this nickname, each Perl*Handler begins with the word “Perl” and ends with the word “Handler”.
A generic Perl*Handler, known simply as PerlHandler, is also available and is quite similar to CGI programs. If you want to receive a request, perform some calculations and return a result, use PerlHandler. Indeed, most applications that are visible to the end user can be done with PerlHandler. The other Perl*Handlers are more appropriate for changing Apache's behavior from a Perl module, such as when you want to add a new type of access log, alter the authorization mechanism, or add some code at startup or shutdown.
I realize the distinction between Perl*Handlers (meaning all of the possible handlers available to Perl programmers) and PerlHandlers (meaning modules that take advantage of Apache's generic “handler”) can be confusing. Truth be told, confusing the two isn't that big a deal, since the majority of programs are written for PerlHandler and not for any of the other Perl*Handlers.
As I mentioned above, mod_perl caches Perl code, compiles it once, then runs that compiled code during subsequent invocations. This means that, in contrast to CGI programs, changes made in our program will not be reflected immediately on the server. Rather, we must tell Apache to reload our program in some way. The easiest way to do this is to send a HUP signal (killall -1 -v httpd on my Linux box), but there are other ways as well. Another method is to use the Apache::StatINC module, which keeps track of modules' modification dates, loading new versions as necessary.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Google's SwiftShader Released
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Interview with Patrick Volkerding
- Non-Linux FOSS: Caffeine!
- SuperTuxKart 0.9.2 Released
- Tech Tip: Really Simple HTTP Server with Python
- Parsing an RSS News Feed with a Bash Script
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide