How To Read 950 E-mail Messages Before Lunch

A discussion of the use of e-mail filters on Unix computers that use Sendmail-like mail systems.
Filter Installation

If you are not so lucky, you may need to install an e-mail filter. These filter programs have a varying degree of ease for installation. They all can be installed without root privileges, but save yourself some trouble, and ask your system administrator to install it for general use. If you are handy with a compiler, offer to help out. If you have to do it yourself, you'll need to get the sources, compile them, and create any needed configuration files. See the sidebar “Where to get a mail filter” for more information on getting and installing the common e-mail filters.

Spend some time figuring out your needs for mail file locking. Locking is used to guarantee that sendmail does not try to write to a file at the same time you do. The documentation for each of these packages mentions this subject, so pay attention. I found out the hard way that my mail reader was using one type of lock while my filter was using another (a moment of silence for all my lost mail, please).

After you either install a mail filter (or locate it, if it's already installed) you will need to start pushing your mail through it. As in the example above, most e-mail filters can be invoked using the .forward file. If your system does not support this feature, you can still use e-mail filters, but you may need to periodically invoke the filter using a script. You can run your script every time you login, or once each time you read mail, or use cron to run it on a regular basis.

How a Filter Works

Now that we have a way to get e-mail pumped into our filter, let's talk about what happens next. What does the filter do with your mail? To start with, “E-mail filters” might be more appropriately labeled “e-mail sorters”. We all have “filters” in our kitchen: flour sifters are used to filter based on size, strainers let liquids through but not the pasta, and coffee filters let the good stuff through leaving behind the brown sludge.

Most people are not satisfied with filtering e-mail based on size, color, or fluid state. They would prefer to sort their mail based on: Where it's from, Who it's from, and What it's about. In e-mail terms, this might be done by looking at the From: or Subject: lines. Instead of “filter” we might do better to talk about an “agent”, “secretary”, “processor”, or even “e-mail dog”. We can give instructions to “an agent”, but not a coffee pot.

With that in mind, we can create a set of rules for our “filter” to follow. If an e-mail message matches on a rule, an action is taken. Procmail and Slocal can have only one action per rule, although there are a few ways to get around that limit. Unless otherwise directed, filtering stops after a match, and the action completes, making the order of rules important. Keep the most specific rules at the top (first), and the default case (what happens if no rules match) at the bottom. Let's look at a few examples:

Using the procmail program, we can create a file in our home directory called .procmailrc. Procmail calls its filtering directions “recipes”. Simple recipes look like this:

* ^From.*
* ^From.*cory
* ^Subject.*Elvis

The first recipe tells procmail to look for mail that contains a line starting with the the word From, and contains the string If procmail finds a match for this description, it stores the e-mail message in the file forkmail. The second recipe matches on two criteria; Email from cory, with a subject that includes the word Elvis, is deleted. The matches are based on regular expression syntax of egrep.

The equivalent using Elm's filter ($HOME/.elm/filter-rules) would look like:

if (from contains "")  ?
        save "~/mail/forkmail"
if (from contains "cory" and subject = "Elvis") then

Elm's filter calls these stanzas “rules”. The matches are “egrep like”, but not as fully-featured as the matching you can get using procmail.

In slocal's .maildelivery file our rules would look like:

# header pattern action result string
# lines starting with a " are ignored,
# as are blank lines
>From ^ "/pkgs/mh/lib/rcvstore +inbox"
>From cory         file R /dev/null
Subject Elvis     destroy N -
General Strategies for Filters

When constructing filter rules, err on the side of caution. Try constructing a few simple rules first. Test the rules by sending mail to yourself, and further test by leaving the filter in place for a few days. If it all works, you can start to add more complex rules. Make sure you have the “default” behavior of your filter set. Learn how to turn on debugging, and then to turn it off. Try out the logging functions. Using procmail's “mailstat”, you can get a summary of what actions procmail has taken. By investing a bit of time in figuring out how to make your filter work, you'll reap manifold time savings later.

While you are learning how to use your filter, you may want to keep a backup mail file. Instead of using one line in your .forward file which invokes your filter, you might want two lines:

"| /path/to/filter"

where username is your login. This way, your mail will all be delivered to your standard system mailbox as well as through your mail filter. If you have misconfigured your mail filter, you will have a backup to retrieve your mail from. There are several other ways to do this, as well.

I find that people tend towards two schemes when building mail filters. The first technique is to sort out mail based primarily on who it is from. The second is to filter based on content or function. Mail from my manager could go in a folder called boss, but I get a lot of mail from him that's not particularly important, not because it's from him, but because of what it's about. For instance, mail from my manager with a subject of “Hardware budget for next year”, would better go into my “budgets” folder than my “boss” folder.

If you are not particularly worried about disk space, why not try both? Filter all the e-mail based on who it's from and on other factors like key words in the Subject line or other headers. In my case, my filter could save a copy of all mail I get from my boss in the folder called “boss”, and store all e-mail with a subject line that contains the word “budget” in the folder called “budgets”.

A side effect of using filters is the excellent tracking you'll get of your e-mail. By using the logging and reporting that comes with Mailagent, Elm-filter, and Procmail, you will learn a lot about your e-mail. How much mail do you get from the “Elvis-lifestyles” mailing list? What percent of the mail is it? How does it compare to the amount of mail you get from your brother? I use the mailstat program to gauge how much time I spend supporting different departments' computers.