Busting Spam with Bogofilter, Procmail and Mutt
Editor's Note: Please see Nick's March 8, 2004, update article for a new configuration that deals with bogofilter's reversed command-line switches for marking spam.
Eric S. Raymond's bogofilter is a fast Bayesian spam filter that implements the algorithm described in Paul Graham's A Plan For Spam. To make it easy for all mutt users on my server to use it, I put the following macros into the system-wide mutt configuration file, /etc/Muttrc:
s (save) is bound to run bogofilter -N before savingr,g, and l (individual reply, group reply, and list reply) are bound to run bogofilter -n before replyingX is bound to run bogofilter -S before deleting
macro index s "<enter-command>unset wait_key\n<pipe-entry>bogofilter -N\n<enter-command>set wait_key\n<save-entry>" macro pager s "<enter-command>unset wait_key\n<pipe-entry>bogofilter -N\n<enter-command>set wait_key\n<save-entry>" macro index r "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<reply>" macro pager r "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<reply>" macro index g "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<group-reply>" macro pager g "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<group-reply>" macro index l "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<list-reply>" macro pager l "<enter-command>unset wait_key\n<pipe-entry>bogofilter -n\n<enter-command>set wait_key\n<list-reply>" macro index X "<enter-command>unset wait_key\n<pipe-entry>bogofilter -S\n<enter-command>set wait_key\n<delete-message>" macro pager X "<enter-command>unset wait_key\n<pipe-entry>bogofilter -S\n<enter-command>set wait_key\n<delete-message>"
You also can place these macros in your personal .muttrc file. The logic for this setup goes like this: if you're saving a message, that means it's worthwhile to you. Thus, we run bogofilter -N, which adds the words in the message to the good list and subtracts them from the bad.
If you're replying to a message in any way, it is also not spam. You obviously wouldn't be replying to spam, because that only begets more spam! So we simply add it to the good list.
Then comes the new key, X. Note that this is shift-X, and not lowercase x. It is a special “delete as spam” key. I use bogofilter -S, which adds words to the spam list and subtracts them from the good list, because the assumption is you're marking spams that bogofilter missed.
Here's how I use these keys. First of all, I put the following three stanzas into my .procmailrc file, to run bogofilter on all incoming mail:
:0fw
| bogofilter -u -e -p
:0e
{ EXITCODE=75 HOST }
# file the mail to spam-bogofilter if it's spam.
:0:
* ^X-Bogosity: Yes, tests=bogofilter
inboxes/zztrash
This means that all mail gets filtered through bogofilter, and it reinforces itself. All spams get added to the spam list, and all good messages get added to the good list, so if spam evolves this will catch it as time goes on.
Now I have put all caught spams into inboxes/zztrash, which is the last mailbox I read. I read my normal inboxes, deleting uninteresting but legitimate mail with the regular d key but zapping spam with X. Remember, if something is in a normal mailbox, bogofilter must have marked it as good, hence the -S to subtract from the good list and add to the spam list.
Every mail I reply to receivers extra reinforcement on the good list. It was added once because it wasn't caught as spam, but it'll get added again because it caught my attention enough to warrant a response.
Once I hit the zztrash folder, I check for any mail misclassified as spam. I simply save them to the folders where they were supposed to go! This runs them through bogofilter -N, which removes them from the spam list and places them on the good list.
I have found that after only a couple days of mail, the system seems to really be catching on to patterns in spam. I find myself correcting less and less for the system, as it is getting much better with the self-reinforcing stuff.
The setup comes with the caveat that the registration performed by the macros is done in addition to whatever bogofilter did when invoked from .procmailrc. For example, saving recognized non-spam means that three things have happened:
All words in the mail were added to the non-spam list when it was processed.
These words are then deleted from the spam word list, even though the mail was never added there.
The mail is again added to the non-spam list.
This is actually a desired, or at least acceptable, result in my eyes. If I save a mail, it is something that is really worth my while. The belt-and-suspenders approach to marking it as non-spam, then, is fine with me.
Of course, you can always change .procmailrc to run bogofilter without -u to remove the feedback loop effects. That makes the mutt keybindings the only commands the registration gets. In that case, the -N and -S switches should be made -n and -s, respectively.
See the bogofilter man page for a complete list of bogofilter options. I encourage you all to play with bogofilter!
email: nick@zork.net
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- One advantage with VMs
1 hour 40 min ago - about info
2 hours 13 min ago - info
2 hours 14 min ago - info
2 hours 15 min ago - info
2 hours 17 min ago - info
2 hours 18 min ago - abut info
2 hours 20 min ago - info
2 hours 21 min ago - info
2 hours 22 min ago - info
2 hours 23 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Nuance -- breaks Mutt tag-prefix behavior
One drawback of aliasing "s" is that it breaks tag-saving, which I use a lot. Apparently tag-prefix works with commands but not macros (according to Sven Guckes on the Mutt list). Gotta pick another letter.
Hope that helps.
Peter
Correct BOGOFILTER website
The best website for bogofilter is the sourceforge page, currently at version 0.8.0. It is the version discussed in the article.
http://sourceforge.net/projects/bogofilter/
Re: Busting Spam with Bogofilter, Procmail and Mutt
A few things:
First off, I am not sure what version of bogofilter that Nick uses - but the
-u and -e are no longer supported in bogofilter 0.7 which is linked to from this article.
Further, this 0.7 version has changed the -n, -N (non-spam) to now be -h, -H (for ham, cute non-spam name).
Also, the capital letter versions are dangerous (-S , -H) as I believe a bug can cause incorrect filtering if used on a message that has not been previously bogofiltered and stored in the database keys. Just avoid them (aka in the macros) until you experiment with what I mentioned - or you know the message has been previously added.
Besides this bug, and the mentioned corrections - it appears to be working great so far!
Re: Busting Spam with Bogofilter, Procmail and Mutt
...
Further, this 0.7 version has changed the -n, -N (non-spam) to now be -h, -H (for ham, cute non-spam name).
...
???
By looking at the latest CVS version of bogoconfig.c you can clearly see that
-n = register-ham
-N = unregister-nonspam
-h = help
-H = no-header-tags
I am not sure what version of bogofilter *you* where using...
Re: Busting Spam with Bogofilter, Procmail and Mutt
OK, there seems to be a newer version 0.8. Just don't use the articles links - use the sourceforge version.
http://sourceforge.net/projects/bogofilter/
sorry about any confusion.
Re: Busting Spam with Bogofilter, Procmail and Mutt
I don't really want to run a MTA so how would I implement bogofilter with KDE's Kmail (preferred) or someother mail client?
Re: Busting Spam with Bogofilter, Procmail and Mutt
see Andre's "Mini how-to Kmail and Bogofilter":
www.andrefelipemachado.hpg.ig.com.br/linux/mini-how-to-Kmail_and_Bogofilter.html
Roger
Re: Busting Spam with Bogofilter, Procmail and Mutt
None of this depends on an MTA. You can configure fetchmail to hand the mail over to procmail instead of connecting to port 25.