Reading E-mail Via the Web

How to write your own program to read and send mail to any server on the Internet.
Handling HTML

Displaying e-mail messages in a web browser has advantages and disadvantages. On the one hand, we must be careful to turn special characters, such as < and >, into their literal equivalents. At the same time, we can take advantage of the web browser to make e-mail addresses and URLs clickable.

Since we want to ensure that characters appear in the headers as well as in the message body, we modify $contents, the variable that contains the entire message contents, before separating the header and body. We turn < and > into < and >, respectively, ensuring that literal text will not be interpreted as if enclosed in HTML tags:

$contents =~ s/</</g;
$contents =~ s/>/>/g;

Making e-mail addresses clickable requires the use of a regular expression to match e-mail addresses. I decided to use the following code:

$contents =~
   s|([\w-.]+@[\w-.]+\.[a-z]{2,3})|
   <a href="mailto:$1">$1</a>|gi;
which looks for any combination of alphanumeric characters, hyphens and periods, followed by an @, followed by the same combination of characters, followed by a two- or three-letter top-level domain. This ensures we will not accidentally turn something like
three pickles @ 20 cents/pickle
into an e-mail address. By turning an actual e-mail address into a “mailto” link, users can click on the link in order to send mail to that address.

Making URLs clickable is somewhat more difficult, since we have to handle more combinations. The code below appears to match a large number of URLs:

   s|(\w+tps?://[^\s&\"\']+[\w/])|
   <a href="$1">$1</a>|gi;

Here, we look for any letters ending with “tp”, with an optional “s” on the end. This allows us to match “ftp”, “http” and “https”, all of which are valid protocols. We then allow any combination of characters following the two slashes, excluding white space and several characters which cannot be transmitted in a URL.

Quotation marks and white space can be sent if they are URL-encoded first. Characters are URL-encoded when the hexadecimal value of their ASCII code is preceded by a percent sign. For instance, the space character is ASCII 32 or 0x20; thus, it can be sent in a URL as %20. CGI.pm automatically decodes such characters, so you need not worry about it in most cases.

The final part of our regular expression stipulates that the final character of a URL must be alphanumeric or a slash. This ensures that odd trailing characters, such as periods and commas, will not be accidentally dragged into the URL and highlighted.

Viewing Selected Messages

The above program works just fine, if you want to view all the messages in your mailbox. If you receive many e-mail messages, viewing all of them in a single long web document can get frustrating.

The program better-print-mail.pl takes into account the fact that we might want to view only a selected list of messages. For example:

if ($query->param("to_view"))
{
   @message_indices = $query->param("to_view");
}
else
{
   @message_indices = (1 .. $num_messages);
}

An HTML form element can be set multiple times, meaning that the element "to_view" might contain zero, one or more elements. All of those are put inside of @message_indices unless to_view was not set, in which case all messages are displayed by default.

How can we get a list of current messages? A program called mail-index.pl (see Listing 3 in the archive file) should do the trick. This program can be invoked from the same sort of form we have seen already; simply modify the “action” to point to mail-index.pl, rather than better-print-mail.pl. As with print-mail.pl and better-print-mail.pl, mail-index.pl must receive the user name, password and name of the mail server in order to function. With that information in hand, it logs into the POP server and displays the message headers for mail waiting to be read.

Each message is presented with a check box. By checking the box next to a message, the user indicates he would like to read that particular message. When the user clicks on the “submit” button, better-print-mail.pl is sent not only the user name, password and mail server, but also the list of checked messages. As we have seen, better-print-mail.pl already knows how to handle this list and prints only requested mail messages.

Conclusion

Setting up a web-based mail system is not all that difficult. I would hesitate before adding a delete function, since I would worry about deleting my only copy of a message. (My e-mail program makes automatic backups, so I never have to worry about that on my own computer.) However, adding such functionality would be quite easy, technically speaking.

Next month, I will show you how to build a system that allows you to send mail as well as read it. We will build on the software we examined this month, adding some functionality to it and tying it into our own mail-sending CGI programs. With a bit of software, you too can begin to compete with Hotmail!

Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the web since early 1993. His book Core Perl will be published by Prentice-Hall in the spring. Reuven can be reached at reuven@lerner.co.il. The ATF home page, including archives and discussion forums, is at http://www.lerner.co.il/atf/.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

extracting the headers with

Anonymous's picture

I am trying to extract the subject line from emails and your article is helping me.

What does the "w-" do in the regular expression "m/^([w-]+):/i)" ?
It's just above the heading "Handling HTML".

Is it supposed to match a character class consisting of "w" and "-" ? I would think something like "s" would be used to match leading whitespace.

Thanks

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix