Using Sendmail as a Multi-Platform Mail Router
Is e-mail the wave of the future? No way. It's a mission-critical, gotta-have application today. Survey those business cards you've been collecting from associates lately. Odds are most of them are sporting Internet e-mail addresses. In the April 1996 issue of Microsoft Magazine, Bill Gates said, “[E-mail] is probably the most mission-critical application for Microsoft in terms of running the company. If we had to pick one application that would keep running no matter what, E-mail would absolutely be it.” People want fast, reliable, written communications with people in their own company as well as with others across the Net.
My job is to keep e-mail running smoothly for my company. A user recently complained to me because it took a whole 12 minutes for her e-mail message to be delivered. People have ever-growing expectations about what e-mail should do for them.
Providing e-mail in a homogeneous environment is simple. If a shop consists entirely of Linux workstations, e-mail is practically automatic. The same is true for all other major variants of Unix. No special gateways are required to connect to the Internet since the e-mail protocol spoken, SMTP, is the same across the board. SMTP stands for Simple Mail Transport Protocol and has been the de facto standard for Internet mail transmission for years. Exchanging mail is almost as simple with non-SMTP commercial packages, as long as everyone in the company uses the same one. The situation is more complex when several incompatible systems are in use. Getting mail to flow seamlessly between them and additionally the Internet can pose a lot of problems. In this article I will explain how Caliber Technology solved these problems quickly and inexpensively with Linux and sendmail.
Caliber System, through its operating units RPS, Viking Freight, Caliber Logistics, Roberts Express and Caliber Technology, is a value added provider of transportation, logistics, and information services. Caliber Technology provides integrated information services to customers, Caliber companies and other interested users.
The Caliber companies operate in vastly different computing environments: IBM mainframes, AS/400s, HP/UX midrange systems, Novell and NT LANs, Tandem minicomputers and more. The e-mail platforms in use include Microsoft Mail, Lotus cc:Mail, Tandem PS Mail, IBM OV/400, TAO/Emc2 and Microsoft Exchange. Unfortunately these mail systems are generally not compatible with each other. We have plans to migrate all the Caliber companies to compatible e-mail platforms, but the migration effort will take at least two years. The good news is gateways are available for all these systems. The function of a mail gateway is to translate messages from one format to another. Some gateways translate messages from one proprietary format to another: for example, from Lotus cc:Mail to Microsoft Mail and vice versa. Other gateways translate messages between a proprietary format and a standard format. An example here would be cc:Mail to and from SMTP. The two most common standard formats are SMTP and X.400. X.400 is the CCITT/ISO standard and is supported by many commercial networks. Since Caliber's goal was to have all its internal mail systems conversing with each other and with the Internet, SMTP was the obvious choice as the common language.
Here's where it gets complicated. Like many companies, Caliber Technology uses non-obvious login names for its mainframe and LAN systems. In a typical Unix environment, my login name might be tlowery.
But in Caliber's multi-platform environment my login name is, say, xyz123. This was a legacy decision to provide better security. There are too many IDs and they're too ingrained in our systems to let us change them, even if we wanted to. The problem is, most of the gateways we use do not allow the SMTP e-mail name to differ from the local login ID. There's just no way to specify that in the configuration. In other words, my Internet mail address would have to be firstname.lastname@example.org. We found this distasteful for obvious reasons; Actually, it's even worse. The gateways add their network host name to the address for outbound mail. My address would really be email@example.com. So the first problem to solve is to convert, for outbound messages, the private gateway-based addresses to the public Internet addresses, for example from firstname.lastname@example.org to email@example.com.
Solving that problem creates another one. The calibersys.com e-mail domain is made up of several mail systems. If mail is coming inbound from the Internet to firstname.lastname@example.org, how is the decision made to send it to gateway1 (say, Microsoft Mail) rather than gateway2 (TAO/Emc2)? Some system along the way has to look up tlowery and decide to send the message to gateway1. At Caliber, we accept mail for calibersys.com, logistics.calibersys.com, shiprps.com, vikingfreight.com and roberts.com and perform name lookups for all of them.
Like many companies, our internal TCP/IP network is connected to the Internet via a commercial firewall. We purchased the firewall long before deciding to provide Internet mail service to all the Caliber companies. In a way, it can be thought of as a legacy system with some of its own drawbacks. Although our firewall can solve either the first or second problem, it can't solve both at the same time. According to our firewall vendor, we were their first customer who needed name mapping and gateway hiding for multiple domains, all at the same time. Commercial X.500-based directory systems can be purchased to tie everything together elegantly; the problem is cost. It doesn't make much sense to plunk down $200,000 for a solution that, we hope, won't be needed two years from now. With no quick fix from the firewall vendor in sight and hoards of users beating down my door for inter-company and Internet mail, I sat down and studied sendmail to see if it offered any answers. Luckily it does.
The solution we have in place today can be seen in Figure 1. Any mail message, including those to or from the Internet, that travels from one system to another travels through a central hub. For added reliability, there are actually two hubs. One serves as the primary; the other is a backup in case the primary fails. The sole purpose of these hubs is to hide the e-mail addressing details from users. Let's look at an example.
First, let's say I at email@example.com want to send a message to Jane at firstname.lastname@example.org. Both the sender and recipient addresses exist inside the Caliber firewall, so the message will not be sent to the Internet. My mailbox name on Microsoft Mail is xyz123 and the SMTP gateway for Microsoft Mail is called gateway1. When the message leaves my mail system, it looks like this:
From: email@example.com To: firstname.lastname@example.org
The message is routed to mhub, our primary address mapping hub. First it looks up email@example.com. When it finds a match, that address is changed to firstname.lastname@example.org. It then looks at email@example.com. There are three possible gateways for vikingfreight.com, gateway1, gateway2 and gateway3, corresponding to Tandem PS Mail, Microsoft Exchange and TAO/Emc2, respectively. The hub looks up the address and sees that Jane's mailbox (jdoe) is on PS Mail and is called vft1100. So mhub changes firstname.lastname@example.org to email@example.com. The message header now looks like this:
From: firstname.lastname@example.org To: email@example.com
The hub then hands the message off to gateway1.vikingfreight.com for delivery to Jane's mailbox. Now let's take a look a closer look at the hub.
The primary mail hub is a Compaq Proliant 1500 with a 133 MHz Pentium processor. The backup hub is a Dell OptiPlex with a 75 MHz Pentium. Both are running Linux Slackware version 3.0. The decision to run Linux was an easy one. I wanted a solid, dependable solution at the lowest possible cost—that meant Intel-based hardware and Linux. I've used Linux for various projects over the last three years and have never experienced a kernel crash. I can't say that for all the commercial operating systems I've used. The crucial piece of software for this project is sendmail, available for virtually all Unix variants. I knew if Linux didn't work out, I could swap in a Sun, IBM or HP workstation without having to make software changes. The risk of using Linux as the initial platform was very small. In the four months since the project went live, I'm happy to say that we've had zero problems with it. Now it's time to discuss the nitty-gritty details. Two primary pieces make the hub work: DNS and sendmail. I'll discuss each in turn.
What follows is a brief introduction to DNS. If you're already familiar with it feel free to skip the next few paragraphs.
DNS stands for Domain Name System. Its job is to keep track of each computer's name on the network. Programs that communicate with other computers require the numerical address of that computer. If all the program has is a name, it gives that name to DNS and asks for the corresponding address. For example, the mail hub has to get the address for gateway1.calibersys.com before it can deliver mail to it. The hub asks DNS for the address and is told something like 220.127.116.11. As soon as the hub has that address, it can contact gateway1 to deliver the mail. The DNS configuration files are filled with lines like these:
mhub.calibersys.com. IN A 18.104.22.168 mhub2.calibersys.com. IN A 22.214.171.124 gateway1.vikingfreight.com. IN A 126.96.36.199
The first column is the name the machine goes by. The second column, IN, isn't important for our discussion. The third column, A, indicates that this is an address record. It just means this line maps a name to an address. The fourth column holds the address of the machine named in column one. In addition to looking up names and giving back addresses, DNS can also indicate that one computer accepts mail for another. When computer A accepts mail for computer B, A is called a Mail Exchanger for B. Whenever sendmail tries to deliver mail to a given machine, it first looks for a mail exchanger. If no mail exchanger is found, it then looks for a regular address. Let's look at an example:
calibersys.com. IN MX 10 mhub.calibersys.com. vikingfreight.com. IN MX 10 mhub.calibersys.com. shiprps.com. IN MX 10 mhub.calibersys.com. roberts.com. IN MX 10 mhub.calibersys.com.
These lines tell sendmail that any mail addressed to calibersys.com, vikingfreight.com, shiprps.com or roberts.com should be sent to mhub.calibersys.com. That's how mail addressed to firstname.lastname@example.org is routed to the hub. The first column can be thought of as a machine name. There doesn't have to be an actual computer using this name; think of it as a pseudo-machine for e-mail purposes. It's what you would see to the right of the @ symbol in an e-mail address. Again, we don't care about the IN for this discussion. The MX in the third column tells DNS that this is a mail exchanger record. Next comes the priority. A machine can have several mail exchangers, each with a different priority. I'll discuss that in a moment. Finally, the last column is the name of the machine acting as the mail exchanger. Any machine acting as a mail exchanger must be a real machine and must have a corresponding address record. Now let's talk about multiple exchangers. Remember that this project involves two hubs, a primary and a secondary, The primary machine is mhub.calibersys.com, the secondary is mhub2.calibersys.com. Both are listed in DNS. Remember that the number 10 above referred to priority. The lower the number, the higher the priority. Let's say that we see the following lines in the DNS configuration file in addition to the ones above:
calibersys.com. IN MX 20 mhub2.calibersys.com. vikingfreight.com IN MX 20 mhub2.calibersys.com. shiprps.com. IN MX 20 mhub2.calibersys.com. roberts.com. IN MX 20 mhub2.calibersys.com.
sendmail would first try to send any mail destined for vikingfreight.com to mhub, since 10 represents a higher priority than 20. If that failed, it would then try to send the mail to mhub2, the backup hub. After either hub receives the message, it looks up email@example.com and converts that address to firstname.lastname@example.org. It would then look up gateway1.vikingfreight.com. It does not have a mail exchanger record listed for it, but it does have an address record. This means that gateway1 accepts its own mail. Using DNS this way to route mail for a domain, e.g., mycompany.com, to an actual computer where the mail is stored, e.g., mail.mycompany.com, is commonplace. What's different here is the address mapping. To see how that's done, we need to look at sendmail.cf.
sendmail.cf is the configuration file for sendmail. So what exactly is sendmail? sendmail is the Swiss Army knife of mail systems. Officially it's known as a message transfer agent, or MTA. There are a few different flavors; Linux Slackware 3.0 comes with the one known as Berkeley V8. Users typically don't interact with it directly. (That task is left to a mail user agent or MUA such as elm or pine.) sendmail runs in the background, silently routing mail from one computer to another. It was written in the 1970's and 1980's by Eric Allman at U.C. Berkeley. Because of Eric's flexible design, sendmail is still the most widely used MTA on the Internet. It's standard issue software with just about any Unix-based operating system. All that flexibility comes at the price of complexity. sendmail is probably the most complex of all the Unix utilities. I'll cover some of sendmail's features and how they can be used to solve our address mapping problems, but a complete discussion of sendmail is beyond the scope of this article. For more information see the resource box. The Caliber mail hub makes heavy use of three sendmail mechanisms: macros, classes, and database lookups. All these are specified in the configuration file, sendmail.cf. sendmail macros are similar to C language macros. A primary difference is that the macro name can only be one character long. For example,
defines the macro G as gateway1.vikingfreight.com. Its C language equivalent would be:
#define G gateway1.vikingfreight.com
The macro is invoked later by specifying a dollar sign followed by the macro name, e.g., $G. Anywhere the symbol $G appears, gateway1.vikingfreight.com would be substituted in its place. Classes are very similar to macros. The difference is that they can expand to one of many different values. For example,
CU xyz123 abc789 def444
defines the class U with three values. Alternatively, sendmail can read the values from an external text file:
This ability is handy if you want to define a class with a large number of values. The class is invoked by specifying a dollar sign, then an equal sign, then the class name, e.g., $=U. I'll explain a little later how classes are useful. The third mechanism exploited by the mail hub is the database lookup. sendmail can consult external databases and swap the lookup key with the value found in the database. A few different database formats are supported; we use GNU dbm databases. The records in these databases have only two fields. The lookup key is the first field. Everything following that is considered a value field. A dbm database with four records could look like this:
xyz123 tlowery ft1100 jdoe bc789 asmith def444 bjones
If sendmail consults the database looking for abc789, it will find the value of asmith and substitute that value in place of abc789 in the e-mail address.
That covers the basic mechanisms. Now let's look at the configuration file itself. sendmail.cf if filled with lines like this one:
R$=U@$G $:$(mapdb $1 $)@$K
These lines are known as rules. The rules are grouped together in subroutines, each of which is called by sendmail to perform a certain task. These subroutines are called sets. It's a terse programming language based on regular expression pattern matching. Each rule examines an e-mail address and may alter it. Rules have two parts, the left-hand-side (LHS) and right-hand-side (RHS), separated by one or more tab characters. The LHS is a pattern; sendmail tries to match the current address with this pattern. If the address matches the pattern, sendmail will rewrite the address based on what the RHS says. If there is no match, the RHS is ignored. Now let's look at the LHS of our sample rule, left to right:
The R simply states that this is a rule. All rules start with the letter R. The next three characters, $=U, are a class reference. Given the class definition above, $=U will match xyz123, abc789 or def444 successfully. If the address begins with any other string, the match will fail. The next character is a literal @. That character must appear in the address for a match to happen. The following two characters, $G, are a macro reference. The macro expands to:
The address email@example.com would match the pattern and the RHS would be invoked. firstname.lastname@example.org would not match since tza555 is not a member of the U class. Likewise, email@example.com would not match since gateway2.calibersys.com is not the value of the G macro. Now let's move on to the RHS to see how an address is rewritten. The RHS of our sample rule is this:
$:$(mapdb $1 $)@$K
The first two characters, $: tell sendmail to only invoke this RHS once. By default, sendmail will invoke the RHS repeatedly as long as the result still matches the LHS. Following that is the string,
$(mapdb $1 $)
which performs the database lookup, tells sendmail to look for a database named mapdb and search for the first item from the LHS. $2 would search for the second item and so on. The first match from our LHS was xyz123. sendmail then searches for that string and finds tlowery, so it replaces xyz123 with tlowery in the address. The next character in the RHS, @, is a literal. It's written to the new address following tlowery. Next comes $K, a macro reference. Let's assume the macro K was defined like so:
sendmail will place calibersys.com after @ in the new address, completing the rewriting process.
From this we see how a private address of firstname.lastname@example.org can be converted to a public address of email@example.com. The recipient of the message will never know the original sender address was not the public address. Switching from public to private can be accomplished in a similar manner. This discussion of address mapping has only scratched the surface; sendmail's flexibility can help the e-mail administrator solve virtually any mail routing task.
The mail routing hub has been in operation at Caliber for about three months, and in that time it has routed over 48,000 messages. Given current traffic statistics, I expect it to handle about 600,000 messages during its first year. Fortunately, the few problems we've experienced have been due to configuration problems. It's too easy to leave out a period here or a dollar sign there. Linux and sendmail have performed flawlessly.
In a perfect world this solution wouldn't be needed. If the Caliber companies used only one common e-mail platform, there would be no need to look up mailbox names and route messages to the right gateway. But many large companies have a number of legacy systems that won't be going away any time soon. Those are exactly the types of environments where tools like sendmail work best. Due to network logistics and geography, we will still be using some of our legacy mail systems until 1998. In the meantime, our Linux e-mail hub will continue chugging through messages, routing them between disparate platforms and helping us meet increasing user expectations.
Tom Lowery (firstname.lastname@example.org) is Project Manager of E-Mail and Groupware Development at Caliber Technology.