Information Management for the Desktop

A script to save e-mail messages into a searchable database for later retrieval.

It's been tough being a newbie for these past several years. I've managed to stick with it and believe I've set a good example for being a sustainable newbie overall. The nice thing about the open-source world is that there is no stigma attached to being stuck on square one for extended periods of time. Unlike public school, where you are subjected to unbearable ridicule if you fall behind, in the Open Source community, there is no "behind", only "forward", "onward" and "upward". That's a good thing.

I have a PostgreSQL database on my home computer, and I must say that it's a most impressive application. I managed to combine that with another application, called pgaccess. Between the command-line interface and pgaccess, I enjoy adding information to the little databases I've created. Sometimes, I'm rewarded with having useful information to look at when I search the database with SQL queries. I've wondered from time to time, however, how I might use the database to centralize all the information on my system. For example, I have some useful information floating around in e-mail archives, never to be read again. I suppose I could learn the search/grep commands that would let me revisit old e-mails, but that doesn't seem productive to me. Now, if the old e-mails with useful information could be transferred to my database, I think that would be nifty. The following, then, is a description of what I came up with that seems simple and easy and might even be useful.

Picking the Language

On my system, which runs the SuSE 7.1 Pro Edition and the Linux2.4 Kernel, I can do a lot. The terminals let me choose which shell language I want, the applications let me choose which programming language I want, and I can mix and match to my heart's content. For this little project, though, I decided to use Perl. My e-mails are text-based, and there's a site, www.cpan.org, that contains untold numbers of scripts, modules and distributions of applications for untold numbers of tasks. It's overwhelming, to say the least. Since Perl is the undisputed "king-of-the-hill" for text-based file manipulation, it is the selected programming language to use.

The Perl documentation is thorough, readable and, except for that regex thing, within the grasp of a lot of folks. Even better, my system has an application called, CPAN. I simply bring up a terminal, type in the command $> Perl -MCPAN -e shell and up pops the prompt, waiting for me to type in what I need. When I do that, it goes out and finds the latest version of what I need, then downloads and installs it. Nice.

So using Perl, I set out to write a simple script that prepares an e-mail message to be added to my PostgreSQL database.

The Script

This project, remember, is not a production tool at this stage. It's designed to let a home user read her e-mail messages and, when the mood strikes, copy and paste the message to a text file. With the e-mail message in a text file, the home user can then run the Perl script, like this:

$> Perl e-mailparse.pl

The e-mail message has two parts: the header lines (FROM, TO, CC, BCC, SUBJECT) and the message body. On my system, I cut and paste an e-mail message to a text file, and it has five lines for the header, followed by two blank lines and then the message body. When I cut and paste to a text file, the e-mail looks like this:

Date: Fri, 10 May 2002 14:40:04 -0700
From: joe@joeisp.com
To: tompoe@renonevada.net
Cc: sam@samisp.com, julie@julieisp.com
Subject: Re: Blue Coat Linux Fixer
    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]
 
Tom,
 
I am the - - - - rest of message. Signed, Joe

I didn't have any use for keeping the date and time, as that was already set up in my little database application. I didn't need the other lines in the header either, as I already set up the database columns to reflect who and what and so on. So I need a simple script to gather the message body and throw away the header information. The script, then, looks like this:

#! /usr/local/bin/Perl -w
# Simple program to prepare an e-mail message for entry into a database
# This program simply discards header information, and collects message
# body into an array.  Once in the array it should print out just the
# message body.
use strict;
my $file = '../PerlStuff/e-mail.txt';
my $outfile = '../PerlStuff/e-mailtotal.txt';
open (IN, "<$file") or die "Can't open $file: !$\n";
open (OUT, ">>$outfile") or die "Can't open $outfile: !$\n";
while (<IN>) {
    if (6 .. eof){ print OUT; }
}
close IN;
close OUT;

The first line is called the "shebang" line. It is a unique format that identifies that this file is a Perl script and tells the computer where to find the Perl installation. The -w at the end of the first line is important--in order to alert the user to problems with the script--and stands for "warnings". The next four lines start with a number sign, signalling a comment that is to be ignored. The next line, line 7, is important, as it raises the level of grammatical correctness for the script to a higher level than otherwise. We can think about it this way: when you write HTML code for a web page, many browsers are forgiving and try to figure out what you wanted to do if you make a mistake and don't place the code properly. Perl will do the same thing. However, we should be as careful as possible, so using use strict; is a way to have Perl check carefully for errors in syntax. It looks for declarations of variables, which we have entered on lines 8-9. The script then opens the text file where we cut and pasted our e-mail. It opens the file where we want to append the information, in preparation for inserting into our little database application, gets rid of the first five lines, gathers the rest (which is the part we want) and appends it to the e-mailtotal.txt file.

To use this script, you need to create a text file for your e-mail cut and paste step. When you do that, bring up the script file and replace my path to e-mail.txt with the path to your own file.

Next, you need to create a destination file that will receive the message body. When you do that, bring up the script file again and replace my path for e-mailtotal.txt with the path to your own file.

Hopefully you counted the number of lines from the top of your e-mail.txt file to where the first blank line starts. That number plus one will be entered in place of my 6 in the script. You'll find that in line 14 of the script.

We haven't talked about installing whatever database you might have on your system, and we haven't talked about installing Perl on your system. If you need help with this, send me an e-mail describing what your system looks like, and we'll try to help you get set up to use this script.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Information Management for the Desktop

Anonymous's picture

In your article you wrote "...run the Perl script, like this: $> Perl e-mailparse.pl" and then mention the shebang, but if you run it like that the shebang is ignored. The shebang is used when the file is marked executable and run like one (if the script is in the cwd:

$ ./e-mailparse.pl) so that the system knows which interpreter to use.

Also, In the shebang and at the prompt Perl should be lowercase.

Re: Information Management for the Desktop

tompoe's picture

Hi: Thanks. Good points. Making sure the permissions are set ahead of time will be an important step. I appreciate the comments.

Thanks,

Tom Poe

Reno, NV

Make a pseudo printer with kprinter

Anonymous's picture

Using kprinter it is fairly simple to cobble up a few lines to do all kinds of things; this would be a good candidate.

Basically, all your script needs to do is take everything from standard_in and run it through ps2ascii (or similar), parse it up and then use psql to do an insert.

Now you can just print from your mailer, choose prt2db (or whatever you named it) from the dropdown list of printers and you're done.

pevans@catholic.org

Re: Make a pseudo printer with kprinter

tompoe's picture

Hi, Paul: That's a beautiful article you wrote about Kprinter, and I encourage everyone to read it:

http://printing.kde.org/documentation/contrib/kprinter/

I'm running KDE, so will give it a shot, and see if I can get it to work. That is good stuff you did. Loved the pictures, as they help loads.

Thanks,

Tom Poe

Reno, NV

Re: Information Management for the Desktop

Anonymous's picture

I wrote an article on freshmeat, Information retrival from $HOME, which might also be of interest.

Re: Information Management for the Desktop

tompoe's picture

Hi: Well, that's an excellent article, and explores so many of the points that came up with my experiences so far. I enjoyed it thoroughly. I hope you write another one soon, and I will look for it.

Thanks,

Tom Poe

Reno, NV

http://www.studioforrecording.org/

http://www.ibiblio.org/studioforrecording/

Re: Information Management for the Desktop

Anonymous's picture

There's certainly no shame in being a newbie -- but taking the time to learn grep will definitely pay you back in spades. Especially for searching through email.

Suggest you look also at grepmail and (for newbies) the nice GUI interface of gtkgrepmail.

These solutions will probably work faster than a handrolled MySQL database.

Re: Information Management for the Desktop

tompoe's picture

Hi: Thank you for your comments. I'm leading [possibly misleading] up to something that might be of interest. The concept of a file system for locating files, and bringing them up to look at or edit, is carried along through email and web and applications on the computer. However, for me, and possibly for others, it's really convenient if there was an easier way to organize the content within the files. A database that acts as a central repository for information, might be one answer. Before the computer, we used to learn and remember by writing things down, and that act of writing was a form of reinforcement. With the computer, we create files and write information with the keyboard, but we more often cut and paste from a variety of resources. It's terribly difficult to remember where we [I] did that. And, I don't like searching first through directories, then through emails, then through bookmarks, to try to retrieve a piece of information, vaguely remembered, if remembered at all.

So, I've tried to take the practice of writing and the speed of computerization to put together a way to centralize information, and to aid the learning and remembering thingy, if that makes any sense.

The next article, if it gets past the editorial department takes a closer look at grep [yes, I do like it, and do use it some], and then begins to support the advantage of moving information and adding metadata for the home user, that might be of interest to her. And, gtkgrepmail has a nice GUI, by the way. Thanks for the link.

Thanks,

Tom Poe

Reno, NV

http://www.studioforrecording.org/

http://www.ibiblio.org/studioforrecording/

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState