Protozilla: Pipes, Protocols and the Web Browser

Protozilla allows client-side CGI to extend Mozilla-based web browsers.

The web browser has now become an indispensable piece of software on the desktops of most computer users, including Linux users. However, the interaction between the browser and the rest of the Linux environment remains minimal. For example, in the Netscape 4.x series, this interaction is essentially just one-way: an external program can ask the browser to load a specific uniform resource locator (URL), or the browser can launch an external program to handle protocols such as Telnet. An external program typically cannot both receive input from the browser and provide content for display back to the browser. One can use applets and plugins for two-way interaction with the browser, but these are pieces of software written specifically to work within the browser. Furthermore, applets and plugins typically execute within a "sandbox", for security reasons, and provide only limited functionality.

Web servers, on the other hand, have always provided an interface for extending their functionality by interacting with external programs. This interface, known as the common gateway interface (CGI), enables the web server to send data to an external program (the CGI program), execute the program and capture HTML output from it for transmission to the web browser. CGI is essentially an attempt to standardize the way in which command-line filters are used in UNIX. A filter program reads data from its standard input, processes the data and writes the results to its standard output. Command-line arguments and environment variables are used to convey additional information to control processing of the data. The CGI specification merely lays out in detail how to do this in the context of a web server. In particular, the CGI standard specifies how the web server uses environment variables to convey information about the URL to the CGI program being executed.

What if the web browser were able to use CGI to execute external programs? One may refer to such a feature as client-side CGI because it would allow the web client to display HTML output by a CGI program directly, without the web server having to act as a middleman. Because CGI programs are not very different from command-line programs, this would enable the browser to access the large body of software available on a Linux system immediately. Such a feature would also permit command-line programs to access the increasingly popular web-based user interface with little or no modification. There are some obvious security issues to consider when adding this feature to a browser; these will be addressed later in this article.

This article describes a recent browser plugin called Protozilla that implements client-side CGI. Protozilla works with the open-source Mozilla browser, which is available for the Linux platform. It should also work with other browsers based on the Mozilla source code, such as Netscape version 6.0 or newer. Before getting into the details of how to use Protozilla, we first need to explain how protocols are implemented in browsers.

URLs, URIs and Protocols

Most of us are now familiar with URLs, which typically begin with the prefix http://. In fact, this prefix is so common, that it is often omitted altogether, leaving just the dot-com part of the URL. A URL typically has the following form: <scheme>://<host-name><path-name><query-string>. The scheme portion of a URL usually denotes a particular communication protocol, but it could simply denote a specific action to be taken by the browser. By far the most common scheme for URLs is HTTP. But other schemes, such as FTP, may also be specified. The hostname portion of the URL identifies the server that handles requests using the protocol. The pathname portion of the URL may be an HTML filename or the pathname of a CGI program. The query-string portion contains information that is passed on to the CGI program through the environment variable named QUERY_STRING, as required by the CGI specification.

The URL syntax is a subset of a more general syntax known as the uniform resource identifier (URI), which has the following form: <scheme>:<scheme-specific-data>. The data portion of a URI consists of all the characters to the right of the scheme name and the colon. There really are no restrictions on the structure of the scheme-specific data, although certain special characters may need to be encoded. For example, the string mailto:user@host is an example of a commonly used URI in web documents. The URL is just a special case of a URI where the data portion has the hierarchical structure described above.

One may loosely think of URIs as being analogous to the UNIX command line. The scheme portion is like the command (or action) name, and the data portion of the URI contains the argument to the command. This analogy is useful because it somewhat describes how Protozilla implements new URI schemes. In this analogy, the command http essentially instructs the browser to download a document using the HTTP protocol from the specified host and display it in the browser window.

A browser comes bundled with a small "vocabulary" of handlers for predefined URI schemes like HTTP and FTP. Protozilla allows the user to extend this vocabulary by defining new URI schemes or by overriding some predefined schemes.