Mailman, the GNU Mailing List Manager
I've spent a lot of time improving the common path a mail message takes through the system. The biggest change has been to design a message pipeline, where each component in the pipeline does a little piece of the work necessary to deliver a message. For example, there are separate components to scan the message for potential spam, calculate the recipients of the message, archive it, gate it to Usenet, and deliver the message to an SMTP (simple mail transfer protocol) daemon.
Each component in the pipeline is really a Python module conforming to a specific API: the module must contain a function called “process” which takes a message object and a mailing list object. When a message is received by Mailman, it runs through a list of these modules, handing the message object off for each to process. If the module raises a Python exception, processing is stopped. This is used when messages must be held for the list administrator's approval (e.g., a posting to a moderated list).
This message pipeline means that Mailman is easily configurable and extensible in the way it handles incoming and outgoing messages. For example, there is a project contributor who has implemented a MIME attachment scanner module which can be dropped into the pipeline. This module can strip attachments from the message, post the attachments to an external archive (either the file system or a WebDAV server) and then rewrite the outgoing message to include a URL to the attachment instead of the attachment text. This module could also be used simply to discard messages with certain types of attachments (e.g., if you hate HTML mail as much as I do, you could just bounce or discard any message that contains a text/html MIME type), strip certain attachment types (e.g., binary attachments just get discarded) or scan attachments for potential viruses.
Currently, there is only one system-wide message pipeline for all Mailman lists at a site, but the plan is eventually to give individual list administrators the opportunity to configure their lists with optional modules. One application of this would be to run a “patches” mailing list which would have an optional module to scan a message for a context or unified diff, and if found, inject the diff into an issue-tracking system.
This streamlining of the message-delivery path has vastly improved the performance of Mailman. We're running the latest CVS snapshot on python.org and easily handling about 30,000 individual recipient deliveries per day, with an average of about 0.01 second per message through the system (from Mailman receipt to SMTP daemon hand-off). The lesson here is that for the best performance, you want to choose your MTA wisely, since it will have the biggest impact on throughput.
A similar pipeline architecture has been designed for bounce detection. Believe it or not, there's actually a standard for bounced messages, called Delivery Status Notification (DSN), described in RFC 1894. The problem is, of course, that it's complex, and many MTA authors disagree with or ignore this standard. This makes bounce detection (like spam detection) a black art. Mailman 1.1 comes with a hairy mess of regular expressions used to scan bounced messages, which get delivered to a different address than regular postings. If Mailman actually detects a bounce, and can extract the offending e-mail address from the bounced message, it increments a counter for that address. Enough bounces, and the address is automatically disabled or removed.
The problem was that updating the regular expressions was nearly impossible, so for Mailman 1.2 we now have a pipeline, similar in architecture to the delivery pipeline, that attempts to recognize just one style of bounce. We currently recognize RFC1894/DSN bounces, Postfix, Qmail, Yahoo! and a few other weirdos. Of course, we still recognize all the old bounce formats Mailman 1.1 recognized, and it's fairly easy to add new matchers—assuming the bounced message can actually be scanned intelligently. I recently added an Smail bounce detector in about five minutes and 20 lines of Python code.
Two other major improvements planned for the 1.2 release are internationalization and user databases.
We've had a large number of requests for making Mailman multi-lingual. Two contributors from Spain, Juan Carlos Rey Anaya and Victoriano Giralt, with help from Mads Kiilerich from Denmark, have sent me patches to accomplish this. The technical approach centers around gettext, where strings to be translated are marked in a special way. The developers then run a tool over the source tree and create template files which can be handed over to translators. Once their language-specific translation files are placed in the proper directory, the application can use these to look up the text string in the specified language.
For Mailman, a site administrator can install any language file they want to make available to their list administrators. It would be up to the list administrators to enable various languages for their lists and to choose a default language. When individual users are interacting with Mailman, they can choose their preferred language from those available to the list. In this way, mailing lists can support multiple languages through both their web and e-mail interfaces. Of course, messages posted to the list aren't translated (although a pipeline module could be implemented to feed the text through Babelfish if you were so inclined).
GNU gettext provides all the necessary tools to create multilingual C programs, but we had to adapt them a bit to work with Python. As with C, we mark Python strings to be translated with a wrapper function call. For example, if you wanted to make this line of code translatable,
subject = "You have been subscribed"
you would modify the line to look like this:
subject = _("You have been subscribed")Most of the work of making an application like Mailman multilingual involves marking translatable text.
Python has a further complication: there are actually eight ways to define a “string”:
'This is a Python string'
"This is a Python string"
'''This is a Python triple-quoted string'''
"""This is a Python triple-quoted string"""
r' This is a Python raw string'
r"This is a Python raw string"
r'''This is a Python triple quoted raw string'''
r"""This is a Python triple quoted raw string"""
Briefly, the '' style and "" style strings are interchangeable, and useful when you don't want to escape one delimiter or the other. The first two string styles are limited to a single line. Triple-quoted strings allow you to embed newlines in the string, serving roughly the same purpose in Python as Perl's HERE documents. Raw strings have different rules for embedded backslashes and are used primarily for regular expressions.
GNU gettext comes with a tool called xgettext which scans your C files for translatable strings. Unfortunately, it doesn't understand Python's various string spellings, and while a few different approaches have been put forward, I favor allowing _() marking of any valid Python string. To accomplish this, I wrote a tool called pygettext.py which scans Python source code, looking for _() wrappers around any type of Python string. The output of pygettext.py is a standard gettext .pot file, so from that point on, the GNU tools can be used. pygettext.py will be a standard part of Python 1.6 and is available via the Python CVS tree at http://cvs.python.org/.
I expect to begin integrating and testing the internationalization patches to Mailman sometime within the next few weeks. Keep an eye on the Mailman CVS tree for details.
- Linux Kernel Testing and Debugging
- NSA: Linux Journal is an "extremist forum" and its readers get flagged for extra surveillance
- Tails above the Rest, Part III
- Wanted: Your Embedded Linux Projects
- RSS Feeds
- The 101 Uses of OpenSSH: Part I
- Tails above the Rest: the Installation
- Dolphins in the NSA Dragnet
- Are you an extremist?
- Tails above the Rest, Part II