Updating Pages Automatically
The home page of my web browser is set to http://www.dilbert.com/, home of the famous and funny Dilbert comic strip. Thanks to the magic of the Internet, I'm able to enjoy Dilbert's tragicomic humor each morning, just before I start my workday.
The Dilbert web site would not be very useful or interesting were it not for the creative talents of Scott Adams, Dilbert's creator. What makes it interesting from a technical perspective is the way in which the comic is updated automatically each day. Every morning, the latest comic is automatically placed on the Dilbert home page, giving millions of fans the chance to see the latest installment.
This month, we will examine several ways in which you can create pages that are automatically updated, so that a user can discover new content at the same URL each day. We will look at several different means to the same end, ranging from CGI programs to cron jobs, and will even take a brief look at how to use databases when publishing new content.
For starters, let's assume our web site consists of seven different pages, one for each day of the week (e.g., file-0.html on Sunday, through file-6.html on Saturday). How can we configure the site so that people requesting today.html (or today.pl) will be shown today's file? In other words, a visitor on Wednesday should be shown file-3.html when requesting today.html. Such a system might be appropriate for a school cafeteria, where the food tends to be the same each day of the week.
Perhaps the simplest solution is a CGI program, which we will call today.pl. If we write the program in Perl, we can easily determine the day of the week using the localtime function, which returns a list of elements describing the current date and time. Using the sixth element of that list, which indicates the current day of the week, we can create the correct URL for that day. Finally, we can use the HTTP “Location” header to redirect the user's browser to the correct location.
A simple implementation of this program is shown in Listing 1. The program should seem familiar to anyone who has written CGI programs. It enables all of Perl's warning systems: -w for optional warnings, -T for extra security, strict for extra compile-time checking and diagnostics for more complete documentation if something fails.
By using CGI.pm, the standard Perl module for writing CGI programs, we gain easy access to any input passed by the server, as well as the various output methods a CGI program might use. Most CGI programs use the output methods meant for returning HTML to a user's browser, including sending a MIME “Content-type” header indicating the type of content about to be sent—in our case, we return a “Location” header, which removes the need for a “Content-type” header.
If the above program is installed as /cgi-bin/today.pl on our server, visitors will always be greeted with the appropriate file for the current day of the week.
The above program, simple as it is, has several flaws. Most significantly, CGI is slow and inefficient; using it to redirect the user's browser to another file will slow down the user's experience, as well as increase the load on your server. Each time a CGI program is invoked, the server must create a new process. If the program is written in Perl, this means the Perl binary must be started, which can take some time.
One solution might be to use mod_perl, which inserts a fully working version of Perl into the Apache web server. Using mod_perl means Apache no longer needs to create a new process, execute the Perl binary or compile the Perl program, which will cut down on server resource use. However, this still means that each time a user requests the home page, the server must execute a program. If the page is requested 1,000 times in a given day, then the program will run 1,000 times. This might not sound like much, but imagine what happens when your site grows in popularity, getting 1,000,000 hits each day.
Even this solution doesn't address the fact that not all users run browsers which handle redirection. If a browser does not handle the notice, the user will be unable to see today's file. This problem is increasingly rare, but keep it in mind if you want the maximum possible audience for your web site.
Let's now examine a strategy in which the program runs only once per day, regardless of how many people ask to see today's page. This method reduces the load on the server and allows people with old browsers to visit our site without any trouble. The easiest strategy is to use Linux's cron utility, which allows us to automatically run programs at any time. Using cron, we can run our program once per day, copying the appropriate file to today.html. On Sundays, file-0.html will be copied to today.html, while on Thursdays, file-4.html will be copied to today.html.
Listing 2 is an example of such a program. If this program were run once a day, then today.html would always contain the file for the appropriate day. Moreover, the server would be able to respond to the document request without having to create a new CGI process or use Perl.
The above program is not run through CGI, but rather through cron. In order to run a program through cron, you must add an entry to your crontab, a specially formatted text file that describes when a program should be run. Each user has a separate crontab file; that is, each user can arrange for different cron jobs to run at different dates and times.
You can edit the crontab file using the crontab program, which is typically in /usr/bin/crontab. To modify your crontab file, use crontab -e, which brings up the editor defined in the EDITOR environment variable. The format of crontab is too involved for me to explain here; typing man 5 crontab on the Linux command line will bring up the manual page describing the format. (Typing only man crontab will bring up a description of the crontab program, rather than the crontab file format, a distinction which can be confusing to new users.)
Assuming we want to run the above program (which I have called cron-today.pl) at one minute after midnight, we could add the following entry to our crontab:
1 0 * * * /usr/local/bin/cron-today.pl
In other words, we want to run /usr/local/bin/cron-today.pl at one minute after midnight (1 0), every day of the month (*), every month (*), and every day of the week (*).
The output from each cron is e-mailed to the user who owns that job. After installing the above line in my crontab, I receive e-mail from the cron job each day at approximately 12:01 a.m. And each day, anyone visiting our site was shown the correct file for today.html.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




4 hours 49 min ago
5 hours 7 min ago
7 hours 20 sec ago
8 hours 53 min ago
15 hours 47 min ago
16 hours 3 min ago
17 hours 55 min ago
23 hours 47 min ago
1 day 4 hours ago
1 day 4 hours ago