Reading E-mail Via the Web

How to write your own program to read and send mail to any server on the Internet.

Now that we have seen how Net::POP3 allows us to retrieve and read mail from a POP server, let's look at how we can integrate it into a CGI program. First, an HTML form is needed as a way to enter a user name and password. Here is a simple one:

<Title>Read your mail!</Title>
<H1>Read your mail!</H1>
<P>Enter your user name, password, and POP server.</P>
<Form method="POST"
<P>POP server: <input type="text" name="mailserver"></P>
<P>Username: <input type="text" name="username"></P>
<P>Password: <input type="password" name="password"></P>
<P><input type="submit" value="Show me my mail!"></P>

The above form sends three parameters to our CGI program—the name of the POP server from which to download the mail, the user name and the password. If you are concerned about the password being sent in the clear, you might want to put the form and CGI program behind a server running SSL, the secure sockets layer. You might also want to investigate POP3's APOP login method, which hides the password somewhat.

The program for reading mail is fairly simple; see Listing 1 in the archive file, The code starts by creating an instance of CGI, providing an object-oriented interface to the CGI protocol. Then an appropriate MIME header is sent to the user's browser, indicating the response will be in HTML-formatted text. Next, the three pieces of information necessary for retrieving the user's mail are grabbed: the name of the POP server, the user name and the password.

Once that information is retrieved, we try to connect to the POP server and log in. Normally, invoking die is a bad idea in a CGI program, since it results in a difficult-to-understand message appearing on the user's screen. However, since we ported CGI::Carp and specified fatalsToBrowser, any invocations of die will send a description of the error message to the browser as well as to the web server's error log. This can be an invaluable tool when debugging, even if your final production code requires you to hide potential error messages.

Once the number of messages waiting on the POP server is known, we can retrieve them with a simple loop:

foreach my $index (1 .. $num_messages)
   print "<H2>Message $index</H2>\n";
   my $message_ref = $pop->get($index);
   print "<pre>\n", @$message_ref, "</pre><HR>\n";

We enclose the mail within <pre> and </pre> tags, since most e-mail depends on fixed-width fonts and formatting.

You may be surprised such a simple program can be used to read your mail, but it does and should work on any system with any web browser. It can be used to quickly check if any new mail has arrived, without affecting your ability to download and read messages with your usual e-mail program.

Ignoring Uninteresting Headers

As is often the case with new programs, our first stab was functional but is missing some useful features. For instance, most users do not need to see all of the headers that come with a message. Typically, they want to see only the “From”, “To”, “Subject”, “Cc” and “Date” headers.

Perl makes it a snap to remove unwanted headers by using regular expressions. Headers can be thought of as a name, value pair separated by a colon. On the left side of the colon is the header name, which can consist of any alphanumeric character or a hyphen. On the right side of the colon is the header's value, which can consist of almost any character.

One consideration is the possibility that a header will be spread across multiple lines. That is, the two lines

Subject: This is a subject header
   that continues onto a second line

should all be considered part of the “Subject” header, since the second line begins with one or more white-space characters.

This problem is solved by creating a hash, %KEEP, in which the keys name the headers to keep. For example:

my %KEEP = ("To" => 1,
   "From" => 1,
   "Subject" => 1,
   "Date" => 1);

The code then checks if a header is to be kept by checking the value of $KEEP{$header_name}, where $header_name contains the value of the header to check.

Before anything can be done to the headers, they must be put into a scalar separate from the message body. Do that with split:

my ($headers, $body) = split "\n \n", $contents, 2;

Notice split has three arguments, telling Perl to split $contents into a maximum of two elements. If the 2 were omitted, $body would contain only the first paragraph of the message, rather than the entire text.

Once the message headers are stored in $headers, it can be split back into an array, and the code can then iterate through the array elements. Each element of @headers is a single header line, which might mark the beginning of a new header or the continuation of an existing one. If this is a new header and its name is in %KEEP, the header is written to the user's browser. If the header's name is not in %KEEP, it is ignored and the program goes on to the next line.

This does not solve the issue of multi-line headers. This is handled by assuming that every line in @headers will begin with either a header (e.g., Received: or X-Mailer:) or with white space. If the pattern at the beginning of the line matches a header value, the program checks %KEEP and if found, prints the line. If the pattern fails to match a header value, it is assumed to be white space, and the line is printed only if the previous line was printed.

Here is some basic code to print the headers:

my @headers = split "\n", $headers;
my $previous = "";
foreach my $line (@headers)
   if ($line =~ m/^([\w-]+):/i)
      $previous = $1;
   print $line, "\n" if $KEEP{$previous};

This code is contained in Listing 2,, in the archive file. This is an improved version of our original bare-bones program, incorporating this and other changes.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

extracting the headers with

Anonymous's picture

I am trying to extract the subject line from emails and your article is helping me.

What does the "w-" do in the regular expression "m/^([w-]+):/i)" ?
It's just above the heading "Handling HTML".

Is it supposed to match a character class consisting of "w" and "-" ? I would think something like "s" would be used to match leading whitespace.