Protozilla: Pipes, Protocols and the Web Browser

Protozilla allows client-side CGI to extend Mozilla-based web browsers.
Protozilla Overview

To use Protozilla, first you need to install it. This is a simple process, similar to installing a browser plugin. You visit the download page on the Protozilla web site using Mozilla/Netscape 6, and click on a button to install the Linux version. After installation, you need to restart the browser and open the Protozilla configuration window from the Tasks menu of the browser. This window lists the sample protocols that come bundled with Protozilla.

The configuration window also displays the path to the Protocols directory, i.e., the directory where Protozilla stores all its protocol handlers. On UNIX/Linux systems, this directory has a name such as $HOME/.mozilla/default/<cookie>.slt/protozilla/protocol, where "default" denotes your Mozilla profile name, and <cookie> denotes a random string.

A protocol handler is simply an executable file in the Protocols directory. Defining a new protocol is as simple as creating a file in this directory. You can create this file using your favorite editor, drag-and-drop the file from the desktop into the Protozilla configuration window or use the Create menu option in the configuration window. The protocol name is simply the filename, excluding any file extension. For example, if the Perl script "foo.pl" is present in the Protocols directory, then the protocol "foo:" is automatically registered. When a URI using this protocol needs to be loaded, Protozilla executes the script foo.pl, much like a CGI program, and displays the standard output from the script in the browser window.

Before displaying the standard output from the protocol handler, Protozilla searches for headers conforming to the multipurpose internet mail extensions (MIME) format, as defined by the CGI specification. We will not get into the details of what exactly MIME headers are. For our purposes, it suffices to say that if the standard output contains an HTML document, the MIME header consists of the line "content-type: text/html" followed by a blank line. The HTML document would follow this MIME header. For plain text output, the header would contain the line "content-type: text/plain". If Protozilla does not find a valid MIME header, it assumes the output to be plain text and displays it as such in the browser.

Using Protozilla to Implement a New URI Scheme

To illustrate how to use Protozilla, let us implement a simple URI scheme called "whois", which will allow us to access the internet domain registry database at whois.internic.net. We want the browser to recognize URIs of the form whois:<string>.

Clicking on the above URI in an HTML document (or typing it in the URL box of the browser) should load a page giving details of all domains that contain the <string> element. To implement this URI scheme, we would like to use the standard command named whois available on Linux systems. This command takes a string argument, searches the registry database and prints out the search results to the standard output. How do we tell the browser to use this command whenever it encounters a whois: URI?

The simplest way to implement the whois: scheme is to create an executable file named whois.sh in the Protocols directory, containing the following two lines:

#!/bin/sh
whois $URI_DATA

The environment variable URI_DATA is initialized to the data portion of the URI before Protozilla executes whois.sh. You can create this file using the Create menu option in the Protozilla configuration window.

After creating whois.sh, type the URI whois:linuxjournal.com in the browser's URL box. This will cause the registry information for linuxjournal.com to be displayed in the browser window.

If the whois command finds multiple matches in the registry database, then it simply lists each of the matching strings, without providing further information. For example, typing the URI whois:linuxjournal in the browser URL box causes the following matching strings to be listed:

LINUXJOURNAL.ORG
LINUXJOURNAL.NET
LINUXJOURNAL.COM

To obtain more information about one of these matches, you need to type in a new whois URI. This illustrates one of the deficiencies with the above simple implementation--it does not use the hypertext capabilities of the browser at all. Another problem with the implementation is that using shell expansion in a script opens up possible security holes.

In the case of multiple matches in the whois output, one would like to be able to simply click on one of the displayed matches to select it, rather than having to type in a new URI. This means that the protocol handler should output an HTML document with clickable URIs, rather than just a plain text document. To add this capability, and to address the security issues, we create a more sophisticated protocol handler using Perl called whois.pl (see Listing 1). Before you create this script in the Protocols directory, remember to first delete the old whois.sh script.

Listing 1. whois.pl

If you know something about CGI scripts, this listing will seem very familiar. The only deviation from the CGI standard is the use of the Protozilla-specific environment variable URI_DATA to obtain the data portion of the URI. The -T Perl option enables taint-checking and makes the script more secure.

The script in Listing 1 outputs an HTML document using MIME headers. In the case of multiple matches, the script takes each matching string and converts it to a clickable hyperlink. If you type the URI whois:linuxjournal, the browser will display an HTML document with three hyperlinks, one for each of the matches. To rerun the whois command for a particular matching string, all you need to do is to click on the link. This is an improvement over the command-line use of whois, where you would need to type in a new string explicitly, or cut-and-paste a string from the screen.

______________________