HEC Montréal: Deployment of a Large-Scale Mail Installation
Over the past few years, e-mail has grown into one of the most important communication mediums. Naturally, e-mail infrastructures must be fast, secure and reliable. Ideally, they also should be able to integrate easily and effectively with anti-unsolicited bulk e-mail (UBE) solutions.
HEC Montréal is Canada's first management school, founded in 1907. More than 11,000 students and 220 professors use HEC's e-mail system every year, and alumni keep their e-mail accounts after graduation. Unfortunately, the proprietary e-mail system did not evolve and as the load started to increase, the infrastructure could no longer keep up with requirements.
The previous mail infrastructure at HEC Montréal was based on four IBM AIX servers running Netscape Messaging Server 4.15. Each of those servers offered all services (IMAP, POP3, SMTP and Webmail access) for a subset of users. The system simply did not scale to current mail requirements. According to Eddy Béliveau, Senior Network Analyst at HEC Montréal:
We found ourselves with mail server software that had not been upgraded in the last two years because the AIX platform was no longer supported by Sun/iPlanet/Netscape, which owned the mail server software. We had a regular increase of our e-mail traffic during the last 12 months due to the presence of UBE and viruses trying to replicate themselves. We got peaks of over 100 concurrent SMTP connections, which was too much for our servers; the typical load average was over 50 on all servers. We could not, on our old 133MHz servers, execute any anti-virus or anti-UBE applications, not even a simple RBL filtering policy. Thus, we had to re-examine the hardware and software architecture of our e-mail system but [could] not find time to install alternatives. We were like a dog running after his tail trying to stabilize the situation.
HEC Montréal contacted us at Inverse, Inc., to help them replace the mail infrastructure and deploy a better alternative.
The proposed solution was driven by the following factors:
Cost: HEC Montréal could not afford a per-user license fee for 35,500 users.
Ease of maintenance: the infrastructure had to be easy to manage. Accounts creation and destruction should be automated, updates should be easy to apply and the infrastructure should let HEC Montréal leverage the expertise they have.
Security: the components of the solution should have a proven security track record.
Robustness: the components should be mature and should have been used in production environments for months. Furthermore, the development should be active to accelerate bug fixes, feature enhancements and security updates.
Scalability: the solution must meet its purpose for many months, because the number of users grows by 2,000–3,000 every year. Its architecture also should allow adding extra servers to distribute the load or offer more redundancy.
When we were first approached, HEC Montréal was leaning toward a Linux-based solution running Novell NetMail 3.1. Having great experience with free alternatives, we decided to compare the solution we had in mind with Novell's offerings.
That said, we built two identical test environments using Red Hat Linux 9 and installed NetMail 3.1 on one and our proposed solution on the other. Next, we performed a series of stress tests in order to measure the stability and the performance of the two solutions. The tests were performed with two benchmarking utilities, postal and tm. The results showed that while NetMail was the fastest for POP3 operations, it proved to be the slowest in the IMAP and SMTP tests. It also had a lot of stability issues when overloading the server with IMAP requests.
Combined with our experience, we proposed a solution based almost entirely on open-source components. We started with a standard Red Hat Linux 9 distribution using Silicon Graphics, Inc.'s XFS kernel packages. We included Cyrus IMAP and Cyrus SASL, which included IMAP, LMTP and POP3 dæmons as well as authentication libraries and redirection/vacation scripts support using Sieve. Next, Postfix, AMaViS, SpamAssassin, Vipul's Razor and NAI VirusScan were added to build a complete SMTP server solution with enhanced tools to limit the delivery of UBE and viruses. Apache, PHP4, IMAP Proxy and SquirrelMail provided a complete Webmail solution. OpenLDAP was added to store all information regarding users' accounts (e-mail address and aliases, SquirrelMail preferences and so on), as well as other specific attributes of HEC Montréal. Finally, we installed Linux HA Heartbeat, software used to monitor the health of some nodes on the network.
The new infrastructure is running on 11 IBM eServer xSeries x305 and x335 servers. The two x335s are connected to an IBM FAST 700 Storage Array Network (SAN) using Fibre Channel, where the mailstore resides. The XFS filesystem is used for the mailstore in order to maximize file access operations. Figure 2 depicts the architecture.
Four STMP servers running Postfix are used: two of them are mail exchangers (MXes) for the HEC Montréal domains and the other two serve internal mailing needs. These servers also use AMaViS, SpamAssassin, Vipul's Razor and Network Associates' VirusScan to limit the delivery of UBE and viruses. Furthermore, two Cyrus IMAP servers are connected using serial and Ethernet cables for high availability. Only one Cyrus IMAP server is active at any moment; it serves all POP3 and IMAP connections, stores mails on the SAN (received using the LMTP protocol from the four Postfix servers) and processes Sieve scripts.
Two Webmail servers run Apache, PHP4, SquirrelMail and IMAP Proxy. The latter is used to cache IMAP connections between SquirrelMail and the Cyrus IMAP server in order to minimize the load (authentication and process forks) on the mailstore. Finally, one other server is used only for testing purposes. That is, any modifications to the infrastructure must go through this server, which is configured to run every component, before being applied to the environment in production.
With regard to the UBE filtering, we check mail at many levels to ensure we block as many as we can. Our checks include carefully chosen real-time blackhole lists (RBLs); header and MIME header checks using up-to-date maps from SecuritySage, Inc.; and content filtering initiated from AMaViS using SpamAssassin, Vipul's Razor for UBEs analysis and VirusScan for viruses.
This solution has proven to be greatly effective and produces few false positives. The system also was built with load balancing and failover in mind. The SMTP and the Webmail servers are used in a round-robin fashion, efficiently distributing the load among all of them.
The main Cyrus server has an identical backup server in case of failure. The latter is connected to the main Cyrus server and uses Heartbeat to monitor the availability of the server. In case of a failure (hardware problem, operating system crash and so on), the secondary Cyrus server takes over all services. Heartbeat automatically mounts the mailstore (located on the SAN), activates the network alias and starts all Cyrus services. This offers a warm switch-over that minimizes the outage time; sometimes it's not even noticeable.
Finally, the LDAP system offers a master node together with a slave that replicates the former using slurpd. All services are configured to failover automatically to the slave in case of a failure on the master node. Some services also are configured to use the slave as the master node in order to distribute the LDAP load among both servers; they failover to the master node.
- Readers' Choice Awards 2013
- Advanced Hard Drive Caching Techniques
- Linux Kernel News - November 2013
- Mars Needs Women
- Raspberry Pi: the Perfect Home Server
- Sublime Text: One Editor to Rule Them All?
- December 2013 Issue of Linux Journal: Readers' Choice
- Tech Tip: Really Simple HTTP Server with Python
- RSS Feeds
- Web Administration Scripts
- Evangelist/Advocate - 5th place - Dedoimedo
53 min 15 sec ago
- Reply to comment | Linux Journal
4 hours 4 min ago
- Reply to comment | Linux Journal
7 hours 25 min ago
- animal pajamas
11 hours 16 min ago
- thanks for you post.
11 hours 23 min ago
- thanks for share, great
1 day 4 hours ago
- There are factors which are
1 day 9 hours ago
- Gnome 3 ?
1 day 10 hours ago
- Reply to comment | Linux Journal
1 day 14 hours ago
- "Redis RethinkDB 4.5%" on Best NoSQL Databases
2 days 34 min ago