Beating Spam and Viruses with amavisd-new and Maia Mailguard
With spam and e-mail worms on the rise, it's boom-time for the makers of antispam and antivirus solutions. New anti-spam laws in Europe and the US have done little to solve the problem, and this situation has sent many people shopping for technological solutions: spam and virus filters.
Scanning and filtering content at every desktop is expensive and impractical, however. Ideally, the spam and virus problem should be tackled as close to the source as possible, to shield everyone downstream. This strategy lets an organization focus its resources on one place, typically the mail gateway.
Server-based solutions rarely come cheap, however. Most of these products are licensed on a per-mailbox basis, whether as add-on software for mail servers or as standalone content-filtering appliances. These solutions can cost thousands of dollars and often require annual subscription fees for access to updated virus signatures and spam patterns.
In this article, we take a look at an open-source content-filtering solution, amavisd-new, and a powerful extension of this project called Maia Mailguard.
Conceptually, amavisd-new is a mail filter—it receives mail from your mail gateway, scans the mail for viruses and spam, quarantines, rejects or discards offending items, and relays the rest to another mail server downstream for delivery. In practice, amavisd-new often is sandwiched between two mail servers running on the same host, particularly at smaller sites where hosting the mail server and content filters on a single machine is practical. Larger sites may choose to install amavisd-new, SpamAssassin and virus scanners together on a separate content-filtering machine. Massive sites may want a load-balanced array of such machines.
amavisd-new was written in Perl, with security and reliability in mind, and works well on virtually all UNIX platforms. It is an RFC-compliant mail handler, designed never to lose any mail. To that end, amavisd-new does not accept responsibility for a mail item until the downstream mail server has done so. This means any errors that occur while filtering the mail do not cause the mail to be lost; it remains in the upstream mail server's queue. amavisd-new offers four types of filtering: virus/malware scanning, spam filtering, banning dangerous attachment types and invalid mail headers.
amavisd-new is not a virus scanner; rather it's a framework that calls one or more virus scanners. More than 30 popular virus scanners currently are supported, including proprietary products from such vendors as Sophos, Symantec and Network Associates, as well as the open-source Clam Antivirus.
Both command-line and dæmonized virus scanners are supported, though dæmonized scanners are much more efficient than their command-line cousins. If your mail server processes a lot of mail, you don't want to have to load a command-line scanner into memory for each mail item and unload it afterward. A virus scanner that runs as a dæmon gets loaded once and then stays in memory, making the process much faster.
If you have multiple virus scanners installed, you can arrange them in primary and secondary groups. The secondary group is consulted if none of the primary scanners is operational.
Spam filtering is handled by amavisd-new by integrating it with SpamAssassin. amavisd-new calls SpamAssassin once per mail item, no matter how many recipients there are, so mailing-list postings don't consume any more resources than does mail addressed to a single recipient.
SpamAssassin provides a broad-spectrum approach to spam filtering, including feature recognition, DNSBL and SPF lookups, collaborative reporting networks and Bayesian learning mechanisms. All of these tests contribute a numeric score to a total for each mail item, and each user can specify a threshold score for deciding whether an item is spam or ham. This is an effective combination, as the strengths of one method make up for the weaknesses of another.
Feature recognizers check the headers or the body of the e-mail looking for patterns that human beings have identified as markers of spam or ham (non-spam mail). The fact that the Date: header contains a time 12 hours in the future or that the mail contains an image but no text might qualify as spam symptoms, whereas a message containing more than a thousand words is more likely to be ham.
SpamAssassin also can check the IP address of the connecting mail server or client against a number of DNS-based block lists (DNSBLs) to determine whether that address is a known spam source. Unlike the traditional use of DNSBLs, however, SpamAssassin does not consider a listing to be damning by itself; it simply adds a value to the mail's total score. This is a much more flexible approach, one that lets you adjust the scores assigned to each DNSBL according to how much you trust that list and the policies of its maintainers. The upcoming SpamAssassin 3.0 also adds support for Sender Policy Framework (SPF) lookups, which try to verify that the connecting host has the authority to send mail for its domain.
Collaborative reporting networks, such as Vipul's Razor, Pyzor and the Distributed Checksum Clearinghouse (DCC) offer another kind of resource for SpamAssassin to consult. The idea is that because spam is broadcast to millions of recipients, by the time you receive your copy, a lot of other people have received more or less identical copies. If a lot of those people already have reported that particular mail as spam, your own spam filter should be able to use that fact in its own decision-making process.
Last, but certainly not least, SpamAssassin offers a Bayesian learning mechanism, which essentially is an automated feature recognizer. Although the manually designed feature recognizers listed above rely on human beings to point out features that indicate spam or ham, the Bayesian approach tries to pick out these features automatically, based on an analysis of the spam and ham you've received already.
|Updates from LinuxCon and ContainerCon, Toronto, August 2016||Aug 23, 2016|
|NVMe over Fabrics Support Coming to the Linux 4.8 Kernel||Aug 22, 2016|
|What I Wish I’d Known When I Was an Embedded Linux Newbie||Aug 18, 2016|
|Pandas||Aug 17, 2016|
|Juniper Systems' Geode||Aug 16, 2016|
|Analyzing Data||Aug 15, 2016|
- Updates from LinuxCon and ContainerCon, Toronto, August 2016
- NVMe over Fabrics Support Coming to the Linux 4.8 Kernel
- What I Wish I’d Known When I Was an Embedded Linux Newbie
- New Version of GParted
- All about printf
- Analyzing Data
- Tor 0.2.8.6 Is Released
- Blender for Visual Effects
- Juniper Systems' Geode
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide