HEC Montréal: Follow-up on the Large-Scale Mail Installation
In order to produce valuable statistics, two tools were used: Spamity and pflogsumm. The former is a complete solution for extracting information from log files of a mail infrastructure based on Postfix and AMaViS. Spamity extracts all the relevant information and stores it in a database. A Web frontend is offered so users simply can log in to the Web application to see the mail rejected by the filtering policies. The nature of Spamity makes it a valuable tool to examine the spam and virus tendencies in order to tune the infrastructure over time to limit the delivery of UBEs and viruses. Spamity efficiently gathers the information related to the rejected messages and classifies it with regard to the following policies:
RBL: Message rejected by a real-time blackhole list.
RHSBL Client: Message rejected by a right-hand side block list.
Header Date: Message has a date from the distant past or future.
Header Subject: Message rejected by suspicious subject.
Header X-Mailer: Message rejected by suspicious mail user agent.
Header Content-Disposition: Message rejected by suspicious attachment.
Header Content-Type: Message rejected by suspicious attached file. The filter method specifies the file extension.
Body: Message rejected by suspicious body content.
Access Username: Message rejected by access username.
Virus: Message rejected by AMaViS together with the anti-virus solution used.
Spam: Message rejected by AMaViS together with SpamAssassin.
On the other hand, pflogsumm is a useful tool for providing a quick overview of Postfix activity. This allows an administrator to identify rapidly potential problems in a Postfix installation. Among the information reported by pflogsumm, we have:
Total number of received, delivered, forwarded, deferred, bounced and rejected messages
Per-day and per-hour message traffic and connection summaries
Various other summaries (warnings, fatal errors, panics) and more.
Using those two tools and some custom Perl scripts, we produced the different figures found in this article.
Figure 2 shows the weekly total number of mail considered to be UBE or containing viruses that were blocked since the beginning of 2004. The rules' efficiency also is shown in this figure.
As shown in Figure 2, the RBL policy is definitively the most effective one, followed by content analysis using SpamAssassin and message Subject header analysis. You also can note that the virus policy numbers are not as high as expected. This is easily understandable as the detection of viruses often is moved from AMaViS to Postfix's header checks (Content-Disposition, for example). This requires considerably less system resources, because we avoid both detailed analysis in SpamAssassin and a process fork, for each received message, for virus scanning using NAI VirusScan. The network analysts proceeded with such modifications after the 01-25 week for the MyDoom e-mail worm.
Furthermore, Figure 3 shows the usage of services offered by the mailstore, during the busiest week of the first three months (March 21-27).
As shown in Figure 3, POP3 is the most solicited service, followed by IMAP and the Web mail system, which also uses IMAP but was separated in the figure. During this week, peeks of 52 POP3 and 338 IMAP concurrent connections were observed coming from a total of 11,000 different users. The mailstore also is responsible for message deliveries in the user's mailboxes using the Local Mail Transfer Protocol (LMTP). Peaks of 75 concurrent delivery processes often were seen.
On the other hand, Figure 4 shows the amount of mail exchanged using the four SMTP servers for the entire month of March 2004.
As shown in Figure 4, 40 to 60% (55,000 messages per day, on average) of all received mail was rejected by various UBE and virus filtering techniques. This number actually is down from 80% in December 2003. At that time, HEC Montréal was receiving more than 125,000 spams per day. Currently, the average number of messages sent per day is 57,000, while the average number of received (from external servers) and delivered email per day is 35,000.
As you have seen from the different figures, the mail infrastructure certainly is a key component at HEC Montréal, as it is highly solicited. Overall, the mail infrastructure has been very fast and stable since it was deployed. Minor updates were performed by network analysts, mainly to keep up with the new e-mail worms.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- OpenOffice.org Off-the-Wall: ToCs, Indexes and Bibliographies in OOo Writer
- Dart: a New Web Programming Experience
- Mediated Reality: University of Toronto RWM Project
- Kinect with Linux
- Power Management in Linux-Based Systems
- A Topic for Discussion - Open Source Feature-Richness?
- General
25 min 59 sec ago - Kernel Problem
10 hours 28 min ago - BASH script to log IPs on public web server
14 hours 55 min ago - DynDNS
18 hours 31 min ago - Reply to comment | Linux Journal
19 hours 3 min ago - All the articles you talked
21 hours 27 min ago - All the articles you talked
21 hours 30 min ago - All the articles you talked
21 hours 32 min ago - myip
1 day 1 hour ago - Keeping track of IP address
1 day 3 hours ago






Comments
nice follow up by the way.i
nice follow up by the way.i think that is good help for single mothers
HEC Montréal: Follow-up on the Large-Scale Mail Installation
Amazing stuff and I very much like the Table 2. Cost Worksheet.
Very detailed and neatly written.Thanks for sharing such a great article. wristbands
User Friendly
Now days email is so user friendly and easy to set up. You use to need an I.T. guy anytime you needed to integrate emails, or set up a brand new email system. Now pretty much everything is integrated making most websites, or html email accessible. Wordpress for example comes with automatic email installation. Just goes to show how far we've come with technology. fulfillment companies
IMAP usage
If their services are anything like ours, their graph of service utilization doesn't clearly represent the real use of their IMAP services. A POP3 user will typically poll the server regularly, causing a connection to be logged each and every time, where it's not uncommon for IMAP users to remain connected throughout the day, showing only one connection initated.
For IMAP utilization, it is usually more interesting to look at the number of concurrent POP and IMAP connections, as this often more closely reflects the real load on the server.
Of course, I could also be entirely wrong about how they're graphing their results.
Re: IMAP usage
I haven't looked closely at their stats. But they are using Squirrelmail to provide their webmail services. We do as well and have noticed that
it logs in a lot to do its job.
IMAPproxy
They're running IMAPproxy (which makes the webmail connections persistent) to counter that.