Schedule, a Cron Adjunct

HOWTOs

by Jim Scott

on March 17, 2003

Cron is the standard UNIX scheduling program. It is controlled by a file, crontab, that controls when programs are run. Cron is a remarkably versatile program, but it has a deficiency. If the computer is down when a command is scheduled to run, cron doesn't run the command until the next scheduled time. This may be inappropriate for commands that must be run on a particular day, even if it happens later than usual. (Think of payroll, which will be noticed, rather than deleting empty logs, which might not.)

This shortcoming can be partially overcome by running programs at or after the scheduled time on a particular day. This scheduler, Schedule, runs a program as soon as possible after the scheduled time. The scheduled time and the status of the run are stored in a database table. Results are stored in a logfile.

A Solution

I have written a C program, inspired by a program that handled scheduling at a former employers, that runs a program any time after its scheduled time, on the day it was scheduled. It resets the database at midnight, reflecting that it's a new day, and nothing has yet run on this new day.

The input is a configuration table and the result of a check program. The check program can be anything you need to check prior to running the program. For example, you can check if an input file is present and is no longer growing, as you might need in a networked environment with chance-y communication. Or perhaps you need 10 input files, and all must be present before processing. Many circumstances will suggest themselves.

Implementation

The Schedule program (Listing 1) is written for a PostgreSQL configuration and state table. It could be changed to use two text files or a text file and a database with little effort. In the former employers' implementation, it used a text table and a database.

Listing 1. Schedule

The program starts by reading its configuration table, setting interrupt vectors and using the at program to schedule resetting the state values in the database. This done, it goes into the schedule loop.

The scheduler, run every sixty seconds, determines the time and traverses a list of programs to be run. If a program hasn't been run and it is after the scheduled time, it is checked and run and the fact logged and reflected in the database. If it has been run, it no longer appears in the refilled list.

The list is refilled from the configuration table at each iteration of the scheduler, so the latency of a change to the configuration table is the same as the latency of the scheduler. The at program and its dæmon, (atd, run programs when requested. You can look at the queue using atq and remove entries during testing using atrm. Type man at for details.

If you change a scheduled time, the change is reflected as soon as the configuration is refreshed, usually within a minute. This would be useful if you knew the data would be late and wanted to avoid unnecessary notification of failure in the logs. Or, perhaps, if the data had already arrived early, and your bonus depended upon prompt processing. Of course, if the data arrives very late, it is run whether you're there or not.

The program currently uses a predefined array of structures to hold the configuration table. This could be malloced and freed each time through the read_config procedure; however, it is small enough that I didn't think it necessary to add the overhead of allocating and freeing the memory once a minute. The size of the structure is 218 bytes, so an array of 100 programs requires only 21,800 bytes of storage.

The re-invocation of the signal callback function each time a signal is received reflects that, in the Linux world, the signal handler is reset to its default behavior when a signal is received. You must explicitly reset the handler unless the default is what you want. This is the same as SysV behavior, but it differs from BSD Unix.

The scheduler depends upon an array of days and times. The days start on Sunday (day 0) and end on Saturday (day 6). The string representing the days some program is to be run looks like NYNYYNN for a program to be run on Monday, Wednesday and Thursday. The scheduled time of a run is entered in local time, in 24-hour format. For example, 2:20 PM is entered as 14:20.

The scheduler runs as a dæmon, in the background. The code to turn it into a dæmon is taken directly from Stevens Unix Network Programming, Vol 1. The program has no standard input or standard or error output after it becomes a dæmon. It talks to you only through the log, and you can communicate with it only through signals. As it is designed, it responds to kill -10 (User signal 1) by quitting. Of course, it may be killed by kill -9, but this should be a last resort with any program. With kill -9, the program gets no time to tidy up or end processing in an orderly manner; it is simply stopped.

Discussion

The scheduler first checks if the prerequisites for running a program have been satisfied, then it runs the program. If the machine is down at the scheduled time, the program is run as soon as possible, allowing for the granularity of the program (currently 60 seconds). This should be adequate for all but critical processing. If ten programs are scheduled to run at one particular time, the last to run does so about ten minutes later. If there is any time between scheduled runs, everything runs on time.

The granularity can be adjusted by changing the sleep period in the scheduling subroutine. As it is, the scheduler responds within about a minute. Actually, five minutes is a more reasonable period; for test purposes, however, the time was set to one minute.

Here is a sample check file:

#!/usr/bin/perl
#Checks for the existence of the glloadfile. Returns 0 if it is present and has
#a non-zero size.
if (-s "/usr/tmp/glloadfile"){
    exit(0);
} else {
    exit(1);
}

If you wanted to know if the file was still growing, you might check its size twice in a loop. If it's still the same size after a minute, it's probably all here, although that judgment must reflect the realities of your processing. For example, some systems create the output file, then consume many hours processing, sometimes with waits between bursts of output. In general, anyone who has dealt with this sort of processing knows what is required for a particular program to run.

The program to be run can be anything. However, all input must come from the program, its files, the environment or the database. The easiest way to allow differing arguments or multiple runs each day is to run the program using a script as the database program to run. The script merely invokes the program with any required arguments. With seven scripts and appropriate entries in the configuration table, it would be possible to run a program at a different time each day, using different arguments each time it is run.

The other problem is in the database. If you want to run the program more than once each day, the second and subsequent runs all must have different names. As far as the database is concerned, if it has run, it has run, at least until tomorrow. By using scripts to invoke a program, you can run it many times, each with a different configuration line and database entry. Simply use a different script name. Here is the format of the database table:

Table "schedule"

Column	Type	Modifiers
days	character(7)
time	character(5)
program	character varying(100)
checkprog	character varying(100)
didrun	character(1)

The SQL to create the database table is:

create table schedule (days char(7),time char(5),program varchar(100),checkprog varchar(100),didrun char(1));

The entry for program is the name to be invoked. This could be the program name or a script name. The entry for didrun has three possible values, Y for yes, N for no and P for pending. If the program has run, the scheduler changes it from N to Y. The midnight reset changes states for all programs to N, reflecting the new day's reality.

Managing the Database

The database sometimes requires changes. I have written a Tk procedure to handle such changes. It is enclosed as listing 2. You can add, delete, query or change database entries using this procedure.

Listing 2. Tk Procedure for Database Changes

Conclusion

Schedule is a viable alternative to cron or at for many types of processing. It's easy to use and can run unattended for many types of processing. It can significantly ease the burden of scheduling on the programmer responsible for both the operation of the machine itself and its production. After all, when the machine's down, you're trying to get it back up and stable. You probably don't have time to worry about what needs to run today. So, within limits, you can let the scheduler run things. The principal limitation is the scheduler won't delay something for a day. It thinks something should run today or not at all. But it does have the advantage of using all of today, rather than only a dedicated small slice of today.

email: James.L.Scott@att.net

Load Disqus comments