An Introduction to the Spambayes Project
Three classifier programs are in the Spambayes software: a procmail filter, a POP3 proxy and a plugin for Microsoft Outlook 2000. I cover the procmail filter and the POP3 proxy in this article. A web interface (covered below) and various command-line utilities, test harnesses and so on are also part of Spambayes; see the documentation that comes with the software for full details.
If you use a procmail-based e-mail system, this is how the Spambayes procmail system works:
All your existing mail has a new X-Spambayes-Trained header. The software uses this to keep track of which messages it has already learned about.
The software looks at all your incoming mail. Messages it thinks are spam are put in a “spam” mail folder. Everything else is delivered normally.
Every morning, it goes through your mail folders and trains itself on any new messages. It also picks up mail that's been refiled—something it thought was ham but was actually spam and vice versa. Be sure to keep spam in your spam folder for at least a day or two before deleting it. We suggest keeping a few hundred messages, in case you need to retrain the software.
You'll need a working crond to set up the daily training job. Optionally, you can have a mailbox of spam and a mailbox of ham to do some initial training.
To set up Spambayes on your procmail system, begin by installing the software. I'll assume you've put it in $HOME/src/spambayes. Then, create a new database:
$HOME/src/spambayes/hammiefilter.py -n
If you exercise the option to train Spambayes on your existing mail, type:
$HOME/src/spambayes/mboxtrain.py \ -d $HOME/.hammiedb -g $HOME/Mail/inbox \ -s $HOME/Mail/spamYou can add additional folder names if you like, using -g for good mail folders and -s for spam folders. Next, you need to add the following two recipes to the top of your .procmailrc file:
:0fw | $HOME/src/spambayes/hammiefilter.py :0 * ^X-Spambayes-Classification: spam $HOME/Maildir/.spam/The previous recipe is for the Maildir message format. If you need mbox (the default on many systems) or MH, the second recipe should look something like this:
:0: * ^X-Spambayes-Classification: spam $HOME/Mail/spamIf you're not sure what format you should use, ask your system administrator. If you are the system administrator, check the documentation of your mail program. Most modern mail programs can handle both Maildir and mbox.
Using crontab -e, add the following cron job to train Spambayes on new or refiled messages every morning at 2:21 AM:
21 2 * * * $HOME/src/spambayes/mboxtrain.py -d $HOME/.hammiedb -g $HOME/Mail/inbox -s $HOME/Mail/spam
You also can add additional folder names here. It's important to do this if you regularly file mail in different folders; otherwise Spambayes never learns anything about those messages.
Spambayes should now be filtering all your mail and training itself on your mailboxes. But occasionally a message is misfiled. Simply move that message to the correct folder, and Spambayes learns from its mistake the next morning.
Many thanks to Neale Pickett for the information in this section.
If you don't use Procmail or don't want to mess with it, or if you want to set up the software on a non-UNIX machine, you can use the POP3 proxy. This is a middleman that sits between your POP3 server and your e-mail program, and it adds an X-Spambayes-Classification header to e-mails as you retrieve them. You also can use the POP3 proxy with Fetchmail; simply reconfigure Fetchmail to talk to the POP proxy rather than your real POP3 server.
The web interface lets you pretrain the system, classify messages and train on messages received via the POP3 proxy, all through your web browser. The software is configured through a file called bayescustomize.ini. This is true of the Procmail filter as well. There's no need to change any of the defaults to use it out-of-the-box, but the POP3 proxy needs to be set up with the details of your POP3 server. All the available options and their defaults live in a file called Options.py, but you need to look at that only if you're terminally curious or want to do advanced tuning. The minimum you need to do is create a bayescustomize.ini file like this:
[pop3proxy]
pop3proxy_servers: pop3.example.com
where pop3.example.com is wherever you currently have your e-mail client configured to collect mail. The proxy runs on port 110 by default. This is fine on non-UNIX platforms, but on UNIX you'll want to use a different one by adding this line:
pop3proxy_ports: 1110to the [pop3proxy] section of bayescustomize.ini. If you collect mail from more than one POP3 server, you can provide a list of comma-separated addresses in pop3proxy_servers and a corresponding list of comma-separated port numbers in pop3proxy_ports. Each port proxies to the corresponding POP3 server.
You can now run pop3proxy.py. This prints some status messages, which should include something like:
Listener on port 1110 is
proxying pop3.example.com:110
User interface url is http://localhost:8880
This means the proxy is ready for your e-mail client to connect to it on port 1110, and the web interface is ready for you to point your browser at the given URL. To access the web interface from a different machine, replace localhost with the name of the machine running pop3proxy.py.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- New Products
- Paranoid Penguin - Building a Secure Squid Web Proxy, Part IV
- Trying to Tame the Tablet
- Developer Poll
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




1 hour 59 min ago
6 hours 12 min ago
8 hours 45 min ago
13 hours 24 min ago
15 hours 46 min ago
1 day 8 hours ago
1 day 11 hours ago
1 day 12 hours ago
1 day 12 hours ago
1 day 13 hours ago