Updating Pages Automatically
The home page of my web browser is set to http://www.dilbert.com/, home of the famous and funny Dilbert comic strip. Thanks to the magic of the Internet, I'm able to enjoy Dilbert's tragicomic humor each morning, just before I start my workday.
The Dilbert web site would not be very useful or interesting were it not for the creative talents of Scott Adams, Dilbert's creator. What makes it interesting from a technical perspective is the way in which the comic is updated automatically each day. Every morning, the latest comic is automatically placed on the Dilbert home page, giving millions of fans the chance to see the latest installment.
This month, we will examine several ways in which you can create pages that are automatically updated, so that a user can discover new content at the same URL each day. We will look at several different means to the same end, ranging from CGI programs to cron jobs, and will even take a brief look at how to use databases when publishing new content.
For starters, let's assume our web site consists of seven different pages, one for each day of the week (e.g., file-0.html on Sunday, through file-6.html on Saturday). How can we configure the site so that people requesting today.html (or today.pl) will be shown today's file? In other words, a visitor on Wednesday should be shown file-3.html when requesting today.html. Such a system might be appropriate for a school cafeteria, where the food tends to be the same each day of the week.
Perhaps the simplest solution is a CGI program, which we will call today.pl. If we write the program in Perl, we can easily determine the day of the week using the localtime function, which returns a list of elements describing the current date and time. Using the sixth element of that list, which indicates the current day of the week, we can create the correct URL for that day. Finally, we can use the HTTP “Location” header to redirect the user's browser to the correct location.
A simple implementation of this program is shown in Listing 1. The program should seem familiar to anyone who has written CGI programs. It enables all of Perl's warning systems: -w for optional warnings, -T for extra security, strict for extra compile-time checking and diagnostics for more complete documentation if something fails.
By using CGI.pm, the standard Perl module for writing CGI programs, we gain easy access to any input passed by the server, as well as the various output methods a CGI program might use. Most CGI programs use the output methods meant for returning HTML to a user's browser, including sending a MIME “Content-type” header indicating the type of content about to be sent—in our case, we return a “Location” header, which removes the need for a “Content-type” header.
If the above program is installed as /cgi-bin/today.pl on our server, visitors will always be greeted with the appropriate file for the current day of the week.
The above program, simple as it is, has several flaws. Most significantly, CGI is slow and inefficient; using it to redirect the user's browser to another file will slow down the user's experience, as well as increase the load on your server. Each time a CGI program is invoked, the server must create a new process. If the program is written in Perl, this means the Perl binary must be started, which can take some time.
One solution might be to use mod_perl, which inserts a fully working version of Perl into the Apache web server. Using mod_perl means Apache no longer needs to create a new process, execute the Perl binary or compile the Perl program, which will cut down on server resource use. However, this still means that each time a user requests the home page, the server must execute a program. If the page is requested 1,000 times in a given day, then the program will run 1,000 times. This might not sound like much, but imagine what happens when your site grows in popularity, getting 1,000,000 hits each day.
Even this solution doesn't address the fact that not all users run browsers which handle redirection. If a browser does not handle the notice, the user will be unable to see today's file. This problem is increasingly rare, but keep it in mind if you want the maximum possible audience for your web site.
Let's now examine a strategy in which the program runs only once per day, regardless of how many people ask to see today's page. This method reduces the load on the server and allows people with old browsers to visit our site without any trouble. The easiest strategy is to use Linux's cron utility, which allows us to automatically run programs at any time. Using cron, we can run our program once per day, copying the appropriate file to today.html. On Sundays, file-0.html will be copied to today.html, while on Thursdays, file-4.html will be copied to today.html.
Listing 2 is an example of such a program. If this program were run once a day, then today.html would always contain the file for the appropriate day. Moreover, the server would be able to respond to the document request without having to create a new CGI process or use Perl.
The above program is not run through CGI, but rather through cron. In order to run a program through cron, you must add an entry to your crontab, a specially formatted text file that describes when a program should be run. Each user has a separate crontab file; that is, each user can arrange for different cron jobs to run at different dates and times.
You can edit the crontab file using the crontab program, which is typically in /usr/bin/crontab. To modify your crontab file, use crontab -e, which brings up the editor defined in the EDITOR environment variable. The format of crontab is too involved for me to explain here; typing man 5 crontab on the Linux command line will bring up the manual page describing the format. (Typing only man crontab will bring up a description of the crontab program, rather than the crontab file format, a distinction which can be confusing to new users.)
Assuming we want to run the above program (which I have called cron-today.pl) at one minute after midnight, we could add the following entry to our crontab:
1 0 * * * /usr/local/bin/cron-today.pl
In other words, we want to run /usr/local/bin/cron-today.pl at one minute after midnight (1 0), every day of the month (*), every month (*), and every day of the week (*).
The output from each cron is e-mailed to the user who owns that job. After installing the above line in my crontab, I receive e-mail from the cron job each day at approximately 12:01 a.m. And each day, anyone visiting our site was shown the correct file for today.html.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Tech Tip: Really Simple HTTP Server with Python
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Google's SwiftShader Released
- Doing for User Space What We Did for Kernel Space
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide