Personalizing “New” Labels

How to let the site visitor know which documents he hasn't seen.

Like many people, I spend a great deal of time on the Web. Some of that time is spent working—writing and debugging programs for various clients. I also spend quite a bit of time reading on the Web, keeping track of the latest news from the real world and the computer industry, and even exploring new sites that friends and colleagues have suggested I visit.

A common feature on web sites, one which never fails to annoy me, is the proliferation of graphics indicating which items are new. I don't mind the fact that the site's author is letting me know the most recently changed or added items. Rather, it bothers me to know that these tags indicate whether the document is new, rather than whether the document is new for me.

When I visit a site for the first time, all of the documents should have a “new” indication, since all are new to me. When I return to the site, only those added since my previous visit should have the “new” graphic, perhaps including those modified since my last visit. In other words, the site should keep track of my usage patterns, rather than force me to remember whether I have read a particular file.

This month, we will take a look at this problem. Not only will we see how to create a web site that fails to annoy me in this particular way, but we will also look at some of the trade-offs that often occur when trying to handle site maintenance, service to the end user and program maintainability.

The Simple Way

Now that I have disparaged the practice of putting “new” labels on a web site's links, let me demonstrate it, so we can have a clear starting point. Here is a simple page of HTML with two links on it, one with a “new” graphic next to it:

<Head><Title>Welcome to My Site</Title></Head>
<H1>Welcome to My Site</H1>
<P>Read <a href="resume.html">my
<P>Read <a href="deathvalley.html"><img src="new.gif">
about my recent trip to Death Valley!</a></P>

When the page's author decides enough time has passed, the “new” logo will go down. These labels are updated by modifying the HTML file, inserting or erasing the graphics as necessary.

This technique has a number of advantages, the main one being that the site requires less horsepower to run. Downloading text and graphics does not require as much of the server's processor as a CGI program, which requires additional memory as well as processing time.

However, this technique also has many disadvantages. First of all, the labels change only when the webmaster decides to modify the HTML file, rather than on an automatic basis. Secondly, the labels fail to take users' individual histories into account, meaning first-time users will see the same “new” labels as daily visitors.

Auto-expiring the Labels

How can we approach this problem? Let's begin with a simple solution that does not use personalization, but does provide more accurate labels than the above approach. We can auto-expire the labels, printing “new” during the first week a file is made available and “modified” the second week. Files more than two weeks old will not have a label.

The easiest way to do this is via server-side includes. SSIs execute as if they were CGI programs, but their output is inserted inside an HTML file. SSIs are useful when you want dynamic or otherwise programmable text inside an HTML file, but don't have enough dynamic output to justify burying the HTML inside a CGI program.

In this particular case, we can take advantage of Apache's advanced server-side include functionality, which allows us to execute a CGI program and insert its output into an HTML file. For example, we can slightly modify our file like this:

<Head><Title>Welcome to My Site</Title></Head>
<H1>Welcome to My Site</H1>
<P>Read <a href="resume.html">my resume.</a></P>
<P>Read <a href="deathvalley.html">
about my recent trip to Death Valley!</a></P>

As you can see, the second link includes an SSI. One nice thing about SSIs is they look like HTML comments, so if you accidentally install an SSI-enabled file on a server that does not know how to parse them, the entire SSI will be ignored.

SSIs work thanks to a bit of magic: before the document is returned to the user's browser, it is interpreted by the server (hence the term “server-side includes”). Apache replaces all of the SSI commands with the result of their execution. This could mean printing something as simple as a file's modification date, but might be as complicated as inserting the results of a large database-access client invoked via CGI.

Listing 1.

In the above example, we run the CGI program, the code for which is in Listing 1. While this program is run via SSI rather than a pure CGI call, it works just like a CGI program. We use, the standard Perl module for writing CGI programs, to retrieve the keywords parameter, which is another way of describing a parameter passed via the GET method following the question mark.

Once we have checked to make sure the file exists, we use the -M operator to ask Perl to tell us the number of days which have passed since the file was last modified . If $ctime is equal to less than 7, the file was modified within the last seven days, meaning the file should be considered “new” for our purposes. We use a font tag to tell the user that the file is new.

If we use SSI with each link on our site, the “New!” message will appear for all links less than one week old.

I considered several ways of handling errors within, including using Perl's die function to exit prematurely and print an error message on the user's screen. In the end, I decided the program should exit silently if the file does not exist, or if no file name is specified at all. You may wish to send a message to the error log, which can be accomplished from within a CGI program by printing to STDERR as follows:

print STDERR "No such file \"$filename\"\n";

A major problem with this arrangement is that CGI programs are inherently resource hogs. If we have ten links on a page, using this technique involves running ten CGI programs—which means launching ten new Perl processes each time we view this page. For now, we will ignore the performance implications and focus on how to get things working. I will discuss performance toward the end of this article and in greater depth next month.