Mailman, the GNU Mailing List Manager
I've spent a lot of time improving the common path a mail message takes through the system. The biggest change has been to design a message pipeline, where each component in the pipeline does a little piece of the work necessary to deliver a message. For example, there are separate components to scan the message for potential spam, calculate the recipients of the message, archive it, gate it to Usenet, and deliver the message to an SMTP (simple mail transfer protocol) daemon.
Each component in the pipeline is really a Python module conforming to a specific API: the module must contain a function called “process” which takes a message object and a mailing list object. When a message is received by Mailman, it runs through a list of these modules, handing the message object off for each to process. If the module raises a Python exception, processing is stopped. This is used when messages must be held for the list administrator's approval (e.g., a posting to a moderated list).
This message pipeline means that Mailman is easily configurable and extensible in the way it handles incoming and outgoing messages. For example, there is a project contributor who has implemented a MIME attachment scanner module which can be dropped into the pipeline. This module can strip attachments from the message, post the attachments to an external archive (either the file system or a WebDAV server) and then rewrite the outgoing message to include a URL to the attachment instead of the attachment text. This module could also be used simply to discard messages with certain types of attachments (e.g., if you hate HTML mail as much as I do, you could just bounce or discard any message that contains a text/html MIME type), strip certain attachment types (e.g., binary attachments just get discarded) or scan attachments for potential viruses.
Currently, there is only one system-wide message pipeline for all Mailman lists at a site, but the plan is eventually to give individual list administrators the opportunity to configure their lists with optional modules. One application of this would be to run a “patches” mailing list which would have an optional module to scan a message for a context or unified diff, and if found, inject the diff into an issue-tracking system.
This streamlining of the message-delivery path has vastly improved the performance of Mailman. We're running the latest CVS snapshot on python.org and easily handling about 30,000 individual recipient deliveries per day, with an average of about 0.01 second per message through the system (from Mailman receipt to SMTP daemon hand-off). The lesson here is that for the best performance, you want to choose your MTA wisely, since it will have the biggest impact on throughput.
A similar pipeline architecture has been designed for bounce detection. Believe it or not, there's actually a standard for bounced messages, called Delivery Status Notification (DSN), described in RFC 1894. The problem is, of course, that it's complex, and many MTA authors disagree with or ignore this standard. This makes bounce detection (like spam detection) a black art. Mailman 1.1 comes with a hairy mess of regular expressions used to scan bounced messages, which get delivered to a different address than regular postings. If Mailman actually detects a bounce, and can extract the offending e-mail address from the bounced message, it increments a counter for that address. Enough bounces, and the address is automatically disabled or removed.
The problem was that updating the regular expressions was nearly impossible, so for Mailman 1.2 we now have a pipeline, similar in architecture to the delivery pipeline, that attempts to recognize just one style of bounce. We currently recognize RFC1894/DSN bounces, Postfix, Qmail, Yahoo! and a few other weirdos. Of course, we still recognize all the old bounce formats Mailman 1.1 recognized, and it's fairly easy to add new matchers—assuming the bounced message can actually be scanned intelligently. I recently added an Smail bounce detector in about five minutes and 20 lines of Python code.
Two other major improvements planned for the 1.2 release are internationalization and user databases.
We've had a large number of requests for making Mailman multi-lingual. Two contributors from Spain, Juan Carlos Rey Anaya and Victoriano Giralt, with help from Mads Kiilerich from Denmark, have sent me patches to accomplish this. The technical approach centers around gettext, where strings to be translated are marked in a special way. The developers then run a tool over the source tree and create template files which can be handed over to translators. Once their language-specific translation files are placed in the proper directory, the application can use these to look up the text string in the specified language.
For Mailman, a site administrator can install any language file they want to make available to their list administrators. It would be up to the list administrators to enable various languages for their lists and to choose a default language. When individual users are interacting with Mailman, they can choose their preferred language from those available to the list. In this way, mailing lists can support multiple languages through both their web and e-mail interfaces. Of course, messages posted to the list aren't translated (although a pipeline module could be implemented to feed the text through Babelfish if you were so inclined).
GNU gettext provides all the necessary tools to create multilingual C programs, but we had to adapt them a bit to work with Python. As with C, we mark Python strings to be translated with a wrapper function call. For example, if you wanted to make this line of code translatable,
subject = "You have been subscribed"
you would modify the line to look like this:
subject = _("You have been subscribed")Most of the work of making an application like Mailman multilingual involves marking translatable text.
Python has a further complication: there are actually eight ways to define a “string”:
'This is a Python string'
"This is a Python string"
'''This is a Python triple-quoted string'''
"""This is a Python triple-quoted string"""
r' This is a Python raw string'
r"This is a Python raw string"
r'''This is a Python triple quoted raw string'''
r"""This is a Python triple quoted raw string"""
Briefly, the '' style and "" style strings are interchangeable, and useful when you don't want to escape one delimiter or the other. The first two string styles are limited to a single line. Triple-quoted strings allow you to embed newlines in the string, serving roughly the same purpose in Python as Perl's HERE documents. Raw strings have different rules for embedded backslashes and are used primarily for regular expressions.
GNU gettext comes with a tool called xgettext which scans your C files for translatable strings. Unfortunately, it doesn't understand Python's various string spellings, and while a few different approaches have been put forward, I favor allowing _() marking of any valid Python string. To accomplish this, I wrote a tool called pygettext.py which scans Python source code, looking for _() wrappers around any type of Python string. The output of pygettext.py is a standard gettext .pot file, so from that point on, the GNU tools can be used. pygettext.py will be a standard part of Python 1.6 and is available via the Python CVS tree at http://cvs.python.org/.
I expect to begin integrating and testing the internationalization patches to Mailman sometime within the next few weeks. Keep an eye on the Mailman CVS tree for details.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- Weechat, Irssi's Little Brother
- New Products
- Tech Tip: Really Simple HTTP Server with Python
- Reply to comment | Linux Journal
24 min 51 sec ago
- Didn't read
35 min 11 sec ago
- Reply to comment | Linux Journal
40 min 11 sec ago
- Poul-Henning Kamp: welcome to
2 hours 50 min ago
- This has already been done
2 hours 51 min ago
- Reply to comment | Linux Journal
3 hours 36 min ago
- Welcome to 1998
4 hours 25 min ago
- notifier shortcomings
4 hours 48 min ago
6 hours 25 min ago
- Android User
6 hours 27 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?