Information Management for the Desktop
It's been tough being a newbie for these past several years. I've managed to stick with it and believe I've set a good example for being a sustainable newbie overall. The nice thing about the open-source world is that there is no stigma attached to being stuck on square one for extended periods of time. Unlike public school, where you are subjected to unbearable ridicule if you fall behind, in the Open Source community, there is no "behind", only "forward", "onward" and "upward". That's a good thing.
I have a PostgreSQL database on my home computer, and I must say that it's a most impressive application. I managed to combine that with another application, called pgaccess. Between the command-line interface and pgaccess, I enjoy adding information to the little databases I've created. Sometimes, I'm rewarded with having useful information to look at when I search the database with SQL queries. I've wondered from time to time, however, how I might use the database to centralize all the information on my system. For example, I have some useful information floating around in e-mail archives, never to be read again. I suppose I could learn the search/grep commands that would let me revisit old e-mails, but that doesn't seem productive to me. Now, if the old e-mails with useful information could be transferred to my database, I think that would be nifty. The following, then, is a description of what I came up with that seems simple and easy and might even be useful.
On my system, which runs the SuSE 7.1 Pro Edition and the Linux2.4 Kernel, I can do a lot. The terminals let me choose which shell language I want, the applications let me choose which programming language I want, and I can mix and match to my heart's content. For this little project, though, I decided to use Perl. My e-mails are text-based, and there's a site, www.cpan.org, that contains untold numbers of scripts, modules and distributions of applications for untold numbers of tasks. It's overwhelming, to say the least. Since Perl is the undisputed "king-of-the-hill" for text-based file manipulation, it is the selected programming language to use.
The Perl documentation is thorough, readable and, except for that regex thing, within the grasp of a lot of folks. Even better, my system has an application called, CPAN. I simply bring up a terminal, type in the command $> Perl -MCPAN -e shell and up pops the prompt, waiting for me to type in what I need. When I do that, it goes out and finds the latest version of what I need, then downloads and installs it. Nice.
So using Perl, I set out to write a simple script that prepares an e-mail message to be added to my PostgreSQL database.
This project, remember, is not a production tool at this stage. It's designed to let a home user read her e-mail messages and, when the mood strikes, copy and paste the message to a text file. With the e-mail message in a text file, the home user can then run the Perl script, like this:
$> Perl e-mailparse.pl
The e-mail message has two parts: the header lines (FROM, TO, CC, BCC, SUBJECT) and the message body. On my system, I cut and paste an e-mail message to a text file, and it has five lines for the header, followed by two blank lines and then the message body. When I cut and paste to a text file, the e-mail looks like this:
Date: Fri, 10 May 2002 14:40:04 -0700
From: joe@joeisp.com
To: tompoe@renonevada.net
Cc: sam@samisp.com, julie@julieisp.com
Subject: Re: Blue Coat Linux Fixer
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "US-ASCII" character set. ]
[ Some characters may be displayed incorrectly. ]
Tom,
I am the - - - - rest of message. Signed, Joe
I didn't have any use for keeping the date and time, as that was already set up in my little database application. I didn't need the other lines in the header either, as I already set up the database columns to reflect who and what and so on. So I need a simple script to gather the message body and throw away the header information. The script, then, looks like this:
#! /usr/local/bin/Perl -w
# Simple program to prepare an e-mail message for entry into a database
# This program simply discards header information, and collects message
# body into an array. Once in the array it should print out just the
# message body.
use strict;
my $file = '../PerlStuff/e-mail.txt';
my $outfile = '../PerlStuff/e-mailtotal.txt';
open (IN, "<$file") or die "Can't open $file: !$\n";
open (OUT, ">>$outfile") or die "Can't open $outfile: !$\n";
while (<IN>) {
if (6 .. eof){ print OUT; }
}
close IN;
close OUT;
The first line is called the "shebang" line. It is a unique format that identifies that this file is a Perl script and tells the computer where to find the Perl installation. The -w at the end of the first line is important--in order to alert the user to problems with the script--and stands for "warnings". The next four lines start with a number sign, signalling a comment that is to be ignored. The next line, line 7, is important, as it raises the level of grammatical correctness for the script to a higher level than otherwise. We can think about it this way: when you write HTML code for a web page, many browsers are forgiving and try to figure out what you wanted to do if you make a mistake and don't place the code properly. Perl will do the same thing. However, we should be as careful as possible, so using use strict; is a way to have Perl check carefully for errors in syntax. It looks for declarations of variables, which we have entered on lines 8-9. The script then opens the text file where we cut and pasted our e-mail. It opens the file where we want to append the information, in preparation for inserting into our little database application, gets rid of the first five lines, gathers the rest (which is the part we want) and appends it to the e-mailtotal.txt file.
To use this script, you need to create a text file for your e-mail cut and paste step. When you do that, bring up the script file and replace my path to e-mail.txt with the path to your own file.
Next, you need to create a destination file that will receive the message body. When you do that, bring up the script file again and replace my path for e-mailtotal.txt with the path to your own file.
Hopefully you counted the number of lines from the top of your e-mail.txt file to where the first blank line starts. That number plus one will be entered in place of my 6 in the script. You'll find that in line 14 of the script.
We haven't talked about installing whatever database you might have on your system, and we haven't talked about installing Perl on your system. If you need help with this, send me an e-mail describing what your system looks like, and we'll try to help you get set up to use this script.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- What's the tweeting protocol?
- Readers' Choice Awards
- New Products
- RSS Feeds
- Dart: a New Web Programming Experience
- Reply to comment | Linux Journal
10 hours 13 min ago - Reply to comment | Linux Journal
12 hours 45 min ago - Reply to comment | Linux Journal
14 hours 3 min ago - great post
14 hours 38 min ago - Google Docs
15 hours 33 sec ago - Reply to comment | Linux Journal
19 hours 48 min ago - Reply to comment | Linux Journal
20 hours 35 min ago - Web Hosting IQ
22 hours 9 min ago - Thanks for taking the time to
23 hours 46 min ago - Linux is good
1 day 1 hour ago



Comments
Re: Information Management for the Desktop
In your article you wrote "...run the Perl script, like this: $> Perl e-mailparse.pl" and then mention the shebang, but if you run it like that the shebang is ignored. The shebang is used when the file is marked executable and run like one (if the script is in the cwd:
$ ./e-mailparse.pl) so that the system knows which interpreter to use.
Also, In the shebang and at the prompt Perl should be lowercase.
Re: Information Management for the Desktop
Hi: Thanks. Good points. Making sure the permissions are set ahead of time will be an important step. I appreciate the comments.
Thanks,
Tom Poe
Reno, NV
Make a pseudo printer with kprinter
Using kprinter it is fairly simple to cobble up a few lines to do all kinds of things; this would be a good candidate.
Basically, all your script needs to do is take everything from standard_in and run it through ps2ascii (or similar), parse it up and then use psql to do an insert.
Now you can just print from your mailer, choose prt2db (or whatever you named it) from the dropdown list of printers and you're done.
pevans@catholic.org
Re: Make a pseudo printer with kprinter
Hi, Paul: That's a beautiful article you wrote about Kprinter, and I encourage everyone to read it:
http://printing.kde.org/documentation/contrib/kprinter/
I'm running KDE, so will give it a shot, and see if I can get it to work. That is good stuff you did. Loved the pictures, as they help loads.
Thanks,
Tom Poe
Reno, NV
Re: Information Management for the Desktop
I wrote an article on freshmeat, Information retrival from $HOME, which might also be of interest.
Re: Information Management for the Desktop
Hi: Well, that's an excellent article, and explores so many of the points that came up with my experiences so far. I enjoyed it thoroughly. I hope you write another one soon, and I will look for it.
Thanks,
Tom Poe
Reno, NV
http://www.studioforrecording.org/
http://www.ibiblio.org/studioforrecording/
Re: Information Management for the Desktop
There's certainly no shame in being a newbie -- but taking the time to learn grep will definitely pay you back in spades. Especially for searching through email.
Suggest you look also at grepmail and (for newbies) the nice GUI interface of gtkgrepmail.
These solutions will probably work faster than a handrolled MySQL database.
Re: Information Management for the Desktop
Hi: Thank you for your comments. I'm leading [possibly misleading] up to something that might be of interest. The concept of a file system for locating files, and bringing them up to look at or edit, is carried along through email and web and applications on the computer. However, for me, and possibly for others, it's really convenient if there was an easier way to organize the content within the files. A database that acts as a central repository for information, might be one answer. Before the computer, we used to learn and remember by writing things down, and that act of writing was a form of reinforcement. With the computer, we create files and write information with the keyboard, but we more often cut and paste from a variety of resources. It's terribly difficult to remember where we [I] did that. And, I don't like searching first through directories, then through emails, then through bookmarks, to try to retrieve a piece of information, vaguely remembered, if remembered at all.
So, I've tried to take the practice of writing and the speed of computerization to put together a way to centralize information, and to aid the learning and remembering thingy, if that makes any sense.
The next article, if it gets past the editorial department takes a closer look at grep [yes, I do like it, and do use it some], and then begins to support the advantage of moving information and adding metadata for the home user, that might be of interest to her. And, gtkgrepmail has a nice GUI, by the way. Thanks for the link.
Thanks,
Tom Poe
Reno, NV
http://www.studioforrecording.org/
http://www.ibiblio.org/studioforrecording/