Updating Pages Automatically

Have a need to change a file on your web site on a daily or monthly basis? This month Mr. Lerner tells us how to do it.

The home page of my web browser is set to http://www.dilbert.com/, home of the famous and funny Dilbert comic strip. Thanks to the magic of the Internet, I'm able to enjoy Dilbert's tragicomic humor each morning, just before I start my workday.

The Dilbert web site would not be very useful or interesting were it not for the creative talents of Scott Adams, Dilbert's creator. What makes it interesting from a technical perspective is the way in which the comic is updated automatically each day. Every morning, the latest comic is automatically placed on the Dilbert home page, giving millions of fans the chance to see the latest installment.

This month, we will examine several ways in which you can create pages that are automatically updated, so that a user can discover new content at the same URL each day. We will look at several different means to the same end, ranging from CGI programs to cron jobs, and will even take a brief look at how to use databases when publishing new content.

Pointing with CGI

For starters, let's assume our web site consists of seven different pages, one for each day of the week (e.g., file-0.html on Sunday, through file-6.html on Saturday). How can we configure the site so that people requesting today.html (or today.pl) will be shown today's file? In other words, a visitor on Wednesday should be shown file-3.html when requesting today.html. Such a system might be appropriate for a school cafeteria, where the food tends to be the same each day of the week.

Perhaps the simplest solution is a CGI program, which we will call today.pl. If we write the program in Perl, we can easily determine the day of the week using the localtime function, which returns a list of elements describing the current date and time. Using the sixth element of that list, which indicates the current day of the week, we can create the correct URL for that day. Finally, we can use the HTTP “Location” header to redirect the user's browser to the correct location.

A simple implementation of this program is shown in Listing 1. The program should seem familiar to anyone who has written CGI programs. It enables all of Perl's warning systems: -w for optional warnings, -T for extra security, strict for extra compile-time checking and diagnostics for more complete documentation if something fails.

By using CGI.pm, the standard Perl module for writing CGI programs, we gain easy access to any input passed by the server, as well as the various output methods a CGI program might use. Most CGI programs use the output methods meant for returning HTML to a user's browser, including sending a MIME “Content-type” header indicating the type of content about to be sent—in our case, we return a “Location” header, which removes the need for a “Content-type” header.

If the above program is installed as /cgi-bin/today.pl on our server, visitors will always be greeted with the appropriate file for the current day of the week.

The above program, simple as it is, has several flaws. Most significantly, CGI is slow and inefficient; using it to redirect the user's browser to another file will slow down the user's experience, as well as increase the load on your server. Each time a CGI program is invoked, the server must create a new process. If the program is written in Perl, this means the Perl binary must be started, which can take some time.

One solution might be to use mod_perl, which inserts a fully working version of Perl into the Apache web server. Using mod_perl means Apache no longer needs to create a new process, execute the Perl binary or compile the Perl program, which will cut down on server resource use. However, this still means that each time a user requests the home page, the server must execute a program. If the page is requested 1,000 times in a given day, then the program will run 1,000 times. This might not sound like much, but imagine what happens when your site grows in popularity, getting 1,000,000 hits each day.

Even this solution doesn't address the fact that not all users run browsers which handle redirection. If a browser does not handle the notice, the user will be unable to see today's file. This problem is increasingly rare, but keep it in mind if you want the maximum possible audience for your web site.

Automatically Copying Pages with cron

Let's now examine a strategy in which the program runs only once per day, regardless of how many people ask to see today's page. This method reduces the load on the server and allows people with old browsers to visit our site without any trouble. The easiest strategy is to use Linux's cron utility, which allows us to automatically run programs at any time. Using cron, we can run our program once per day, copying the appropriate file to today.html. On Sundays, file-0.html will be copied to today.html, while on Thursdays, file-4.html will be copied to today.html.

Listing 2 is an example of such a program. If this program were run once a day, then today.html would always contain the file for the appropriate day. Moreover, the server would be able to respond to the document request without having to create a new CGI process or use Perl.

The above program is not run through CGI, but rather through cron. In order to run a program through cron, you must add an entry to your crontab, a specially formatted text file that describes when a program should be run. Each user has a separate crontab file; that is, each user can arrange for different cron jobs to run at different dates and times.

You can edit the crontab file using the crontab program, which is typically in /usr/bin/crontab. To modify your crontab file, use crontab -e, which brings up the editor defined in the EDITOR environment variable. The format of crontab is too involved for me to explain here; typing man 5 crontab on the Linux command line will bring up the manual page describing the format. (Typing only man crontab will bring up a description of the crontab program, rather than the crontab file format, a distinction which can be confusing to new users.)

Assuming we want to run the above program (which I have called cron-today.pl) at one minute after midnight, we could add the following entry to our crontab:

1 0 * * * /usr/local/bin/cron-today.pl

In other words, we want to run /usr/local/bin/cron-today.pl at one minute after midnight (1 0), every day of the month (*), every month (*), and every day of the week (*).

The output from each cron is e-mailed to the user who owns that job. After installing the above line in my crontab, I receive e-mail from the cron job each day at approximately 12:01 a.m. And each day, anyone visiting our site was shown the correct file for today.html.