Linux Job Scheduling
Today, in our ongoing series on learning to live with Linux's “inner dæmons”, we are going to look at two dæmons that schedule job execution on Linux. These dæmons are more or less exactly like those found on virtually every UNIX out there. (Linux has separate dæmons for at and cron. Old versions of Linux used a program called “atrun”, which was run in root's crontab once a minute to execute at requests. Some other Unix operating systems have atd functionality directly in crond. This qualifier brought to you by the bureau of auctorial honesty. This article will cover atd and crond as they are distributed with most currently sold distributions, including Debian 2.1, Red Hat, SuSE and Corel, among others.) My test cases were all carried out on a Red Hat 6.1 installation using version 3.1.7 of at. Debian and SuSE versions I currently have are at 3.1.8.
As for cron, most Linux distributions use “Vixie cron” which was originally written, as you might guess, by Paul Vixie. The distributions have each done their own fixes to address a security hole discovered in August 1999. Check your distribution's update page for the most recent version of cron, and make sure you have it installed.
What you think about at and cron will largely depend on what your background is. If you are familiar with only the DOS and Windows world, you should be fairly impressed with what atd and crond offer, even if you have made use of the System Agent, which has certain similarities to crond. If you are an old hand from the world of MIS where you had JCL and various batch environment control systems, you will probably find atd and crond lacking in some essential features. Even so, I hope you will come away from this introduction with a healthy appreciation for what these tools do offer, and perhaps a few ideas about how, even with their limitations, they significantly enhance Linux's capabilities.
People with a mainframe background are very familiar with the concept of job scheduling. They usually use this term interchangably with batch processing. Alas, job scheduling is not batch processing. Batch processing, to my mind at least, includes the concepts of job dependencies, batch process monitoring, checkpoint/restart and recoverability. Neither atd nor crond provides these facilities. If you come from the world of big iron, you may be feeling some disappointment. Don't. As you will see, atd and crond fit in well with the overall UNIX philosophy of simple tools that do one thing well.
If you are coming from a Windows/DOS perspective, you should be pleased by the multi-user nature of atd and crond. Unlike System Agent, you do not have to be logged in for your jobs to be carried out.
If you have a UNIX background, well, you are amongst old friends here.
For those totally unfamiliar with these concepts, what we are talking about is running programs. So what, you say? I log in and type commands and click on little icons. I run programs all day. What's the big deal?
What about having programs run at a certain time of the day, whether you are there or not? What about compiling the latest version of WINE on a busy Linux server when it won't slow down the branch office Intranet? What about that annoying log file the on-line order application spits out that is about to eat up all the free disk space on /usr/prod/orders?
This is where job scheduling comes into play.
There are two kinds of scheduled jobs. You can think of them as “one shot” and “repeating”. One-shot jobs are single executions of programs you want to have take place at some future time, whether or not you are logged in. Repeating jobs are programs you want to have run at certain times or dates, over and over again.
The command you use to schedule one-shot jobs is called “at”. The way to schedule repeating jobs is through a “crontab” (which is a portmanteau word made from CRON TABle, similar to INITtialization TABle and other *nix-y portmanteau words). Oddly enough, the command used to view, edit and store crontabs is called “crontab”.
Unlike some of the other dæmons we have covered in this series, these two have interactive user programs that control them. Because of this, we will cover the basics of using these two dæmons as a non-privileged user (I hope you aren't logging in to your Linux system as root!), then we will go over the dæmons and how they work, then we will cover some fine points of “non-user” or system-scheduled jobs, and finally some of the little “gotchas” that sometimes cause commands to behave differently than you expect when you run them through a scheduler.
The at command is used to schedule one or more programs for a single execution at some later time. There are actually four client commands:
at: Runs commands at specified time
atq: Lists pending commands
atrm: Cancels pending jobs
batch: Runs commands when system load permits
The Linux at command accepts a number of time specifications, considerably extending the POSIX.2 standard. These include:
HH:MMRun at this hour and minute. If this is already passed, the next day is assumed. A 24-hour time is assumed, unless you suffix the time with “am” or “pm”.
now noon midnight teatimeYou read that right. You can type “at teatime”, and Linux's at is civilized enough to know that this is 4 p.m. local time. The “noon” and “midnight” keywords have their normal meaning. The “now” keyword means what it says. It might seem like a dumb thing to have, since if you wanted to run something now, you would type it without the at command, but it has an application in “relative time” invocations. We'll see those after the date modifiers described below.
These time specifications may be optionally followed by a date specification. Date specifications come in a number of forms, including:
These mean what you would expect. “at teatime tomorrow” will run the commands at 4 p.m. the following day. Note that if you specify a time already passed (as in “at noon today” when it is 3 p.m.), the job will be run at once. You do not get an error. At first you might think this a bad thing, but look at it this way. What if the system had been down since 10 a.m. and was only being restarted now at 3 p.m.? Would you want a critical job skipped, or would you want it to run as soon as possible? The at system takes the conservative view and assumes you will want the job run.
<month_name> <day> [<year>]where month_name is “jan” or “feb”, etc., and day is a day number. The year is optional, and should be a four-digit year, of course.
MM/DD/YYYY YYYY-MM-DDDon't listen to what the “man at” page tells you! At least in Red Hat 6.1, it is wrong! I suspect it is wrong in certain other releases as well, and I'm willing to bet this is because the documentation has not caught up with Y2K fixes to this subsystem. The at shipped with Red Hat 6.1 handles dates in the two formats above. It appears to handle 2-digit years correctly, turning values less than 50 into 20xx and those greater than 50 into 19xx. I did not test to find the exact pivot point, and I do not recommend that you bother to, either. If you use two-digit years at this point, be prepared to pay a price! Depending on your version of at to treat two-digit years a certain way is foolish. Use four-digit years. Haven't we learned our lesson? (If you worked with computers from 1995 to 1999, you felt the pain as work came to an almost complete halt while we pored over every system with microscopes, looking for date flaws in the designs of our systems. Don't make a Y2.1K problem! PLEASE!!!)
Another way you can modify a time specification is to apply a relative time to it. The format of a relative time specification is + <count> <time units>, where “count” is simply a number and “time units” is one of “minutes”, “hours”, “days” or “weeks”.
So, you can say:
at 7pm + 2 weeks
and the programs will be scheduled for two weeks from today at 7 p.m. local time.
One of the most common forms is this:
at now + x units
to specify a program or programs to be run so many units from now. Something I often use this for is in shutting down my home machine's dial-up connection from work. I dial in before I leave for work, and then I kill it before my wife gets home (I'm too cheap to buy a second line). I use ssh to log in from work, and I like to close all my windows cleanly, so I frequently do something like this:
# ps fax | grep wvdial 599 ? S 0:00 \_ wvdial 875 pts/2 S 0:00 \_ grep wvdial # at now + 10 minutes at> kill 599 at> warning: commands will be executed using /bin/sh job 9 at 2000-04-17 16:30 # exit $ exitI then have ten minutes to disconnect cleanly from my home system before my phone connection gets dropped.
Note that the plain old Bourne shell is used for all commands run by at. (Also note: I had to type ctrl-d, the *nix EOF character to close the interactive at session. More on this in the section on the at command line. This is just one factor affecting the behavior of at scheduled commands. Here are some other facts to bear in mind. The present working directory, environment variables (with three exceptions, see below), the current userid and the umask that were in effect when the at command was issued are retained and will be used when the commands are executed. The three environment variable exceptions are TERM, DISPLAY and “_” (which usually contains the last command executed in the shell). The output of the commands is mailed to the user who issued the at command. If the at command is issued in an su shell (meaning, if you “became” another user), the output mail will be sent to the login user, but the programs will run under the su user.
The ability to use at is controlled by two files: /etc/at.deny and /etc/at.allow.
The /etc/at.allow file is checked first. If it exists, only user names in this file are allowed to run at. If the /etc/at.allow file does not exist, then the /etc/at.deny file is checked. All user names not mentioned in that file may run at.
If neither file exists, only the superuser may run at.
The at command runs either the commands passed on standard input (passed in through a pipe, or typed at the “at>” prompts as in the example above), or it runs the commands specified in the file named by the -f parameter.
The general form of the at command line is:
at [-V] [-q <queue>] [-f <file>] [-mld] <TIME>
where “queue” is a queue name. Queue names are letters, a-z or A-Z. See the section called “Queues” for more details.
“file” is the name of a file containing commands to run.
“TIME” is a time specification as discussed in detail above.
The remaining switches are -m (send mail to the user when the job is complete, even if no output was produced); -l (an alias for atq. See the atq section below); -d (an alias for atrm. See the atrm section below).
The atq command lists jobs queued by the current user (unless run as superuser, in which case pending jobs for all users are listed).
Here's a sample:
mars:20:~$ atq 5 2000-06-20 15:00 a 6 2000-07-04 15:00 a 10 2000-04-24 14:33 f mars:21:~$
The first column is the job number, followed by the scheduled run time, followed by the queue. In this case, two jobs are in queue “a” and one in queue “f”. See the section on queues for more information.
You can use the -q switch to look at jobs only in a particular queue.
The atrm command is used to delete jobs from the atq. For example, consider the queue in the atq example above. The following session illustrates the use of atrm:
mars:21:~$ atrm 6 mars:22:~$ atq 5 2000-06-20 15:00 a 10 2000-04-24 14:33 f mars:23:~$
You may list any number of job numbers on the command line.
The batch command is a variation of at that, rather than scheduling a job for a time in the future, submits a job now, but that job will not start until the system's load average falls below 0.8. What is load average? The simplest way to think of it is the number of processes that are waiting to run. Most of the time, programs are idle, waiting for hardware or for input, or waiting for the kernel to complete a request. When a program actually has something to do, it is in a runnable state. If the system is not busy, the kernel generally gives control to such a program right away. When some other program is in the middle of running, the program that has just become runnable must wait. The instantaneous system load is the number of runnable processes that are not running. The load average is an average of this instantaneous load over a short period of time. Thus, a system that is below 1.0 load average has some idle time. A system that is at and hovers near 1.0 is fully busy, and at theoretical maximum capacity. A system that is over 1.0 has no idle time, and processes are waiting for a chance to run. Note that this does not necessarily mean the system becomes perceptibly slower to users, but it does mean the maximum capacity of the system has been reached and programs are running slower than they might on a less busy system.
The batch command schedules a job for “right now”, but will delay the start of the job until there is idle time (load average less than 0.8) on the system. Note that this test is for starting the job. Once it is started, it will run to completion, no matter how busy the system becomes during the run.
Note that this section is quite Linux-specific. Other UNIX operating systems I have used have queues, but they are different from those documented here. Always consult local documentation. AIX doesn't work this way, for example.
Queues are a way of grouping jobs together in separate lists. They are named from a-z and A-Z. The at command by default puts jobs on queue “a”, whereas the batch puts jobs on queue “b” by default.
Queue names with “greater” values run at higher “niceness”. Nice values are a way that Linux (and other UNIX systems) set job priorities. The default nice level of a job is “0”, which means “normal”. Jobs can have nice values from -20 (highest possible priority) to +19 (lowest possible priority). Only the superuser can give jobs a negative nice value. We won't say anymore about nice here, as a discussion of the kernel scheduler is well beyond our scope. Just know that jobs in the “z” queue run at a lower priority (and thus slower and with less impact on other running jobs) than do jobs in the “a” queue.
Jobs that are running will be in the “=” queue, which is reserved for running jobs.
Queue names are case sensitive! Rembember, there are a-z queues and A-Z queues. The A-Z queues are special. If you use at to put a job on a queue with a capital letter, then the job is treated as if it were submitted to the batch command at the run time instead of the at command.
In other words, putting a job on an uppercase queue is like combining at and batch. When the job runs, it runs immediately if the load average is below 0.8, otherwise it waits until the load average falls below that point. In no case will the job start before its scheduled time.
Phew! All of that and we still haven't looked at the dæmon that takes care of all this! I hope you are beginning to see that “at”, while not a complete batch processing system, certainly provides a great deal of capability.
The at and batch commands put jobs into the at queue. What is the at queue? Well, there is a directory, /var/spool/at, which is accessible only to the dæmon user and the superuser (everything is available to the superuser). For each job, there is a file in the directory. The file is a shell script that sets up the environment and umask, cd's to the working directory and then runs the programs specified to at/batch in succession.
The commands go into the shell script exactly as they were typed/piped to at. Each is run in turn. If you used &, && or ; to background jobs, or make jobs dependent on one another, these will be observed.
Important note! The shell /bin/sh is used to run these jobs. If you normally use some other shell, such as tcsh, be aware that you can't use the semantics of that shell because /bin/sh will be used instead.
At this point, documenting the dæmon is rather anticlimactic. The atd dæmon examines the /var/spool/at directory. The names of the files actually encode their runtimes, queues and batch vs. at status. These files are shell scripts that set up the environment and run the job as described above. Output from the jobs is temporarily stored in /var/spool/at/spool until the jobs are completed, upon which the output is mailed to the invoking user.
Potentially every user on the system has a crontab, which is a portmanteau word made from CRON TABle. The command to create, examine and modify crontabs is called crontab.
There are four ways to invoke crontab.
crontab <file> crontab -l crontab -r crontab -e
Generally, crontab works on your own crontab. All four forms accept the -u option followed by a user name. In most cases, you will be able to view and edit other users' crontabs only if you are the superuser. You might want to check your system security if you are able to edit another user's crontabs. You probably have some problems!
The first form stores the named file as the crontab, replacing any current crontab. The second form dumps the current crontab to stdout. The third form removes the current crontab. The fourth form opens the current crontab in the editor specified by the VISUAL or EDITOR environment variable.
If you want to experiment with your crontab, it's a good idea to do a
crontab -l working-crontab
to save your current crontab if any, then use
crontab -eto modify your crontab in your favorite editor. you can always use
crontab -r working-crontabto put everything back the way it was.
At this point, you may be wondering what a crontab looks like and what it does.
A crontab is a list of program command lines along with a specification of when to run that command line. It is a whitespace-delimited file with a newline between commands. Blank lines and lines beginning with a pound character (#) are ignored.
The fields are:
minute hour day of month month day of week command
Any of the time fields may be an asterisk (*), which means “every”. Thus, an entry of:
* * * * * fetchmailWill run fetchmail once a minute, every minute of every hour, every day.
Ranges of numbers are allowed. So:
* 8-17 * * 1-5 fetchmail
will run fetchmail once a minute, between 8 a.m. and 5 p.m., Monday through Friday (0 or 7 represents Sunday).
Lists are allowed. Thus:
0,20,40 * * * 1-5 fetchmail
will run fetchmail at the hour, at 20 past, and again at 40 past the hour every hour of the day, Monday through Friday.
Step values are allowed after asterisks and ranges. They are of the form <range>/<step>. So,
*/5 8-17/2 * * * cp /var/log/* /log/backup
will run that cp command (just in case you had started thinking you could run only fetchmail) every five minutes in the 8 a.m., 10 a.m., noon, 2 p.m. and 4 p.m. hours of every day.
Finally, names may be used for months (jan-dec, case insensitive) and days of the week (sun-sat, case insensitive). The Red Hat man pages claim that you can't use names in ranges, but I gave it a try myself and it appeared to work correctly.
This is the area that confuses users of cron the most. They specify commands they run every day from their interactive shells, and then they put them in their crontab and they don't work or they behave differently than they expected.
For example, if you write a program called “fardels” and put it in &HOME/bin, then add $HOME/bin to your PATH, cron might send you mail like this:
/bin/sh: fardels: command not found
The PATH cron uses is not necessarily the same as the one your interactive shell uses.
It is necessary to understand that the environment in which cron jobs run is not the environment in which they operate every day.
First of all, none of their normal environment variables are initialized as they are in their login shells. The following environment variables are set up by the cron dæmon:
SHELL=/bin/sh LOGNAME set from /etc/passwd entry for the crontab's UID. HOME set from /etc/passwd entry for the crontab's UID.
We've been holding out on you. There's another kind of entry allowed in your crontab file. Lines of the form iname=value are allowed to set environment variables that will be set when jobs are run out of the crontab. You may set any environment variable except LOGNAME.
An important one to note is MAILTO. If MAILTO is undefined, the output of jobs will be mailed to the user who owns the crontab. If MAILTO is defined but empty, mailed output is suppressed. Otherwise, you may specify an e-mail address to which to send the output of cron jobs.
Finally, any percent sign in the command portion of a job entry is treated as a newline. Any data which follows the first percent sign is passed to the job as standard input, so you can use this to invoke an interactive program on a scheduled basis.
The ability to have and use a crontab is controlled in a manner very similar to the at subsystem. Two files, /etc/cron.allow and /etc/cron.deny, determine who can use crontab. Just as in the case of at, the cron.allow is checked first. If it exists, only the users listed there may have cron jobs. If it does not exist, the cron.deny file is read. All users except those listed there may have cron jobs.
If neither file exists (and this is quite unlike “at”), all users may have crontabs.
There is hardly anything to document here. The cron dæmon (which is called either cron or crond) takes no arguments and does not respond to any signals in a special way. It examines the /var/spool/cron directory at start-up for files with names matching user names in /etc/passwd. These files are read into memory. Once per minute, cron wakes up and walks through its list of jobs, executing any that are scheduled for that minute.
Each minute, it also checks to see if the /var/spool/cron directory has changed since it was last read, and it rereads any modifications, thus updating the schedule automatically.
I've led you through a merry dance so far. I've got you thinking that only users have crontabs, and that all scheduled jobs run as the crontab's owning user. That's almost true. Cron also has a way to specify crontabs at a “system” level. In addition to checking /var/spool/cron, the cron dæmon also looks for an /etc/crontab and an /etc/cron.d directory.
The /etc/crontab file and the files in /etc/cron.d are “system crontabs”. These have a slightly different format from that discussed so far.
The key difference is the insertion of a field between the “day of week” field and the command field. This field is “run as user” field. Thus:
02 4 * * * root run-parts /etc/cron.daily
will run “run-parts /etc/cron.daily” as root at 2 minutes past 4 a.m. every single day.
There you have it. While Linux does not ship with a mature and complete batch process management tool, still the combination of at and cron permit considerable flexibility and power.
Bear in mind that we have covered the Linux versions of these tools as shipped with most current distributions. While just about every UNIX system on the market has these tools, some things vary.
Expect at queues to be different. Not all crons support names or ranges. Most do not support lists of ranges or the increment feature. No other cron with which I am familiar supports setting environment variables in the crontab. I don't think any other at supports “teatime” as a time specification.
This boils down to a basic piece of advice. Always check the local documentation. If in doubt, experiment.
Michael Schwarz ([email protected]) is a consultant with Interim Technology Consulting in Minneapolis, Minnesota. He has 15 years of experience writing UNIX software and heads up the open-source SASi project. He has been using Linux since he downloaded the TAMU release in 1994, and keeps the SASi project at http://alienmystery.planetmercury.net/.