Mailman, the GNU Mailing List Manager
I'm constantly inundated with e-mail from friends, colleagues, users, developers, strangers, band-mates and spammers. Sometimes it's hard to tell them apart. A closer look at the nearly 2500 messages sitting in my inbox tells me most of these messages are from some mailing list or other. All in all, I like these mailing lists, perhaps because through them I communicate with my co-workers and I get feedback from the community on the open-source projects I maintain. The problem is, of course, that I get too much e-mail to keep up with. I'm not alone in this either, and the reasons are clear: e-mail mailing lists are great ways to meet and converse with people who share common interests.
There are probably a half dozen or more list-management systems that folks are using to deposit e-mail messages into my inbox. On python.org, we currently run about 60 mailing lists, most of them technical SIGs (special interest groups) centered around various Python topics. For a long time we used Majordomo, which was the standard in free list-management systems. A few years ago we started switching our lists over to Mailman, a GPL list manager originally written by John Viega, and in 1999, we were finally able to move all our mailing lists over to Mailman. In this article, I'd like to talk about what Mailman is and focus on what the future of Mailman might look like.
Mailman does all the common things you'd expect from a list manager. It has built-in archiving, supports RFC934 and MIME digest delivery, mail/Usenet gateways, e-mail-based administrative commands, integrated bounce handling, spam detection, flexible SMTP delivery, support for virtual domains, list moderation and more. Mailman runs on almost any Linux or UNIX-type operating system, and packages are available for most Linux distributions. You probably don't care, but Mailman currently doesn't run on Windows.
The most striking thing about Mailman is that it's highly integrated with the Web. Every mailing list has its own web page, and almost all interaction with the system for both users and list administrators can be done through the Web. This makes it easy to use for both computer novices and experts. In fact, I run many decidedly non-techy mailing lists, and most members can easily follow the web pages to subscribe, unsubscribe and modify simple parameters of their list subscriptions.
While list administrators are usually much more experienced computer users, they too can have an easy time configuring and managing their lists through the Web. Mailman's web administration interface allows them to approve or deny closed-list subscription requests, dispose of “held” postings (those that look suspiciously like spam or are posted to a moderated list) and configure the various operating parameters of their list.
Mailman's architecture is actually flexible enough to provide several ways into the system. The web interface is augmented by e-mail-based commands in the tradition of Majordomo. For list administrators who have access to the command line on the Mailman system, a large number of scripts are available for doing more complex tasks. About the only thing a list administrator can't do from the Web or e-mail is create and delete mailing lists, and we hope to improve this in future versions.
Mailman is written primarily in Python, with a few C modules for improved security. The core functionality of Mailman is implemented as a set of Python classes in a Python package. This makes it easy to interact with Mailman in two novel ways (for the hacker at heart). You can easily craft new scripts that use the Python classes, and you can interact with a live Mailman installation through the Python interactive interpreter. There's even a convenience script provided by Mailman, called bin/withlist, which does most of the boiler plate of using a mailing list interactively. You can invoke it like this from the command line in the directory where Mailman is installed:
python -i bin/withlist -l mylist
This executes the withlist script, opening and locking (-l) the “mylist” mailing list. Afterwards, you're left at the standard Python prompt with your open list object bound to the name m. You can then interactively inspect (or change) the object from the Python interpreter—a neat and very powerful feature.
As I mentioned, Mailman was originally written by John Viega who released the code under the GPL, so it has been officially adopted by the GNU project. You can find more information on Mailman at www.gnu.org/software/mailman/mailman.html, which is mirrored at http://www.list.org/.
Building Mailman from the source requires GNU make, an ANSI C compiler (gcc is, of course, fine) and Python 1.5 or better, although I recommend at least Python 1.5.2. All of these tools either come by default on your Linux distribution or are readily downloadable and installable, e.g., by RPM. To run Mailman you'll also need a web server, such as Apache, and an SMTP daemon, such as Postfix, Exim, Qmail or Sendmail. Building Mailman from the source in the gzipped tar file should be straightforward for anyone who is familiar with GNU configure. Be sure to read the various READMEs for specific configuration instructions related to the MTA (mail transfer agent), web browser and OS you happen to be using.
Mailman is officially at release 1.1, but the latest snapshot is available via anonymous CVS. See the above URLs for details. There have been some major architectural improvements in the CVS tree (probably to be called version 1.2) and more coming soon. I'd like to spend the rest of this article talking about some of those changes.
I've spent a lot of time improving the common path a mail message takes through the system. The biggest change has been to design a message pipeline, where each component in the pipeline does a little piece of the work necessary to deliver a message. For example, there are separate components to scan the message for potential spam, calculate the recipients of the message, archive it, gate it to Usenet, and deliver the message to an SMTP (simple mail transfer protocol) daemon.
Each component in the pipeline is really a Python module conforming to a specific API: the module must contain a function called “process” which takes a message object and a mailing list object. When a message is received by Mailman, it runs through a list of these modules, handing the message object off for each to process. If the module raises a Python exception, processing is stopped. This is used when messages must be held for the list administrator's approval (e.g., a posting to a moderated list).
This message pipeline means that Mailman is easily configurable and extensible in the way it handles incoming and outgoing messages. For example, there is a project contributor who has implemented a MIME attachment scanner module which can be dropped into the pipeline. This module can strip attachments from the message, post the attachments to an external archive (either the file system or a WebDAV server) and then rewrite the outgoing message to include a URL to the attachment instead of the attachment text. This module could also be used simply to discard messages with certain types of attachments (e.g., if you hate HTML mail as much as I do, you could just bounce or discard any message that contains a text/html MIME type), strip certain attachment types (e.g., binary attachments just get discarded) or scan attachments for potential viruses.
Currently, there is only one system-wide message pipeline for all Mailman lists at a site, but the plan is eventually to give individual list administrators the opportunity to configure their lists with optional modules. One application of this would be to run a “patches” mailing list which would have an optional module to scan a message for a context or unified diff, and if found, inject the diff into an issue-tracking system.
This streamlining of the message-delivery path has vastly improved the performance of Mailman. We're running the latest CVS snapshot on python.org and easily handling about 30,000 individual recipient deliveries per day, with an average of about 0.01 second per message through the system (from Mailman receipt to SMTP daemon hand-off). The lesson here is that for the best performance, you want to choose your MTA wisely, since it will have the biggest impact on throughput.
A similar pipeline architecture has been designed for bounce detection. Believe it or not, there's actually a standard for bounced messages, called Delivery Status Notification (DSN), described in RFC 1894. The problem is, of course, that it's complex, and many MTA authors disagree with or ignore this standard. This makes bounce detection (like spam detection) a black art. Mailman 1.1 comes with a hairy mess of regular expressions used to scan bounced messages, which get delivered to a different address than regular postings. If Mailman actually detects a bounce, and can extract the offending e-mail address from the bounced message, it increments a counter for that address. Enough bounces, and the address is automatically disabled or removed.
The problem was that updating the regular expressions was nearly impossible, so for Mailman 1.2 we now have a pipeline, similar in architecture to the delivery pipeline, that attempts to recognize just one style of bounce. We currently recognize RFC1894/DSN bounces, Postfix, Qmail, Yahoo! and a few other weirdos. Of course, we still recognize all the old bounce formats Mailman 1.1 recognized, and it's fairly easy to add new matchers—assuming the bounced message can actually be scanned intelligently. I recently added an Smail bounce detector in about five minutes and 20 lines of Python code.
Two other major improvements planned for the 1.2 release are internationalization and user databases.
We've had a large number of requests for making Mailman multi-lingual. Two contributors from Spain, Juan Carlos Rey Anaya and Victoriano Giralt, with help from Mads Kiilerich from Denmark, have sent me patches to accomplish this. The technical approach centers around gettext, where strings to be translated are marked in a special way. The developers then run a tool over the source tree and create template files which can be handed over to translators. Once their language-specific translation files are placed in the proper directory, the application can use these to look up the text string in the specified language.
For Mailman, a site administrator can install any language file they want to make available to their list administrators. It would be up to the list administrators to enable various languages for their lists and to choose a default language. When individual users are interacting with Mailman, they can choose their preferred language from those available to the list. In this way, mailing lists can support multiple languages through both their web and e-mail interfaces. Of course, messages posted to the list aren't translated (although a pipeline module could be implemented to feed the text through Babelfish if you were so inclined).
GNU gettext provides all the necessary tools to create multilingual C programs, but we had to adapt them a bit to work with Python. As with C, we mark Python strings to be translated with a wrapper function call. For example, if you wanted to make this line of code translatable,
subject = "You have been subscribed"
you would modify the line to look like this:
subject = _("You have been subscribed")Most of the work of making an application like Mailman multilingual involves marking translatable text.
Python has a further complication: there are actually eight ways to define a “string”:
'This is a Python string'
"This is a Python string"
'''This is a Python triple-quoted string'''
"""This is a Python triple-quoted string"""
r' This is a Python raw string'
r"This is a Python raw string"
r'''This is a Python triple quoted raw string'''
r"""This is a Python triple quoted raw string"""
Briefly, the '' style and "" style strings are interchangeable, and useful when you don't want to escape one delimiter or the other. The first two string styles are limited to a single line. Triple-quoted strings allow you to embed newlines in the string, serving roughly the same purpose in Python as Perl's HERE documents. Raw strings have different rules for embedded backslashes and are used primarily for regular expressions.
GNU gettext comes with a tool called xgettext which scans your C files for translatable strings. Unfortunately, it doesn't understand Python's various string spellings, and while a few different approaches have been put forward, I favor allowing _() marking of any valid Python string. To accomplish this, I wrote a tool called pygettext.py which scans Python source code, looking for _() wrappers around any type of Python string. The output of pygettext.py is a standard gettext .pot file, so from that point on, the GNU tools can be used. pygettext.py will be a standard part of Python 1.6 and is available via the Python CVS tree at http://cvs.python.org/.
I expect to begin integrating and testing the internationalization patches to Mailman sometime within the next few weeks. Keep an eye on the Mailman CVS tree for details.
The other goal for the next release is to include real user databases. A user should be able to have one Mailman login for all lists of which they are members at a site. This login should contain a list of all addresses to which messages can potentially be delivered and should allow the user to select which mailing list delivers to which address. Additionally, each user would need to remember only one password to change their delivery options.
Currently, each Mailman list maintains its own list of member addresses. This makes the data store relatively easy to implement and recipient calculation fast, but it can be a real pain for users who are subscribed to many lists at a particular site. For list administrators who own multiple lists, it's even worse. For this reason, we want to move toward having a real database of users and administrators in the back end, using caching or other techniques to make membership calculation perform acceptably. One of the core Mailman maintainers has an implementation of this that he's currently testing, and it is expected to be integrated some time soon. This change may be significant enough to call the next release of Mailman version 2.0.
Barry Warsaw (email@example.com) is Project Lead for Software Development with the MEMS Exchange at CNRI. He is the current primary maintainer of Mailman and JPython, the 100% Pure Java implementation of Python. Barry has used and contributed to Python since 1994. He has also written and maintained numerous other smaller open-source projects over the last 15 years, including CC Mode for Emacs.