Protozilla: Pipes, Protocols and the Web Browser

Webmaster

by R. Saravanan

on December 21, 2001

The web browser has now become an indispensable piece of software on the desktops of most computer users, including Linux users. However, the interaction between the browser and the rest of the Linux environment remains minimal. For example, in the Netscape 4.x series, this interaction is essentially just one-way: an external program can ask the browser to load a specific uniform resource locator (URL), or the browser can launch an external program to handle protocols such as Telnet. An external program typically cannot both receive input from the browser and provide content for display back to the browser. One can use applets and plugins for two-way interaction with the browser, but these are pieces of software written specifically to work within the browser. Furthermore, applets and plugins typically execute within a "sandbox", for security reasons, and provide only limited functionality.

Web servers, on the other hand, have always provided an interface for extending their functionality by interacting with external programs. This interface, known as the common gateway interface (CGI), enables the web server to send data to an external program (the CGI program), execute the program and capture HTML output from it for transmission to the web browser. CGI is essentially an attempt to standardize the way in which command-line filters are used in UNIX. A filter program reads data from its standard input, processes the data and writes the results to its standard output. Command-line arguments and environment variables are used to convey additional information to control processing of the data. The CGI specification merely lays out in detail how to do this in the context of a web server. In particular, the CGI standard specifies how the web server uses environment variables to convey information about the URL to the CGI program being executed.

What if the web browser were able to use CGI to execute external programs? One may refer to such a feature as client-side CGI because it would allow the web client to display HTML output by a CGI program directly, without the web server having to act as a middleman. Because CGI programs are not very different from command-line programs, this would enable the browser to access the large body of software available on a Linux system immediately. Such a feature would also permit command-line programs to access the increasingly popular web-based user interface with little or no modification. There are some obvious security issues to consider when adding this feature to a browser; these will be addressed later in this article.

This article describes a recent browser plugin called Protozilla that implements client-side CGI. Protozilla works with the open-source Mozilla browser, which is available for the Linux platform. It should also work with other browsers based on the Mozilla source code, such as Netscape version 6.0 or newer. Before getting into the details of how to use Protozilla, we first need to explain how protocols are implemented in browsers.

URLs, URIs and Protocols

Most of us are now familiar with URLs, which typically begin with the prefix http://. In fact, this prefix is so common, that it is often omitted altogether, leaving just the dot-com part of the URL. A URL typically has the following form: <scheme>://<host-name><path-name><query-string>. The scheme portion of a URL usually denotes a particular communication protocol, but it could simply denote a specific action to be taken by the browser. By far the most common scheme for URLs is HTTP. But other schemes, such as FTP, may also be specified. The hostname portion of the URL identifies the server that handles requests using the protocol. The pathname portion of the URL may be an HTML filename or the pathname of a CGI program. The query-string portion contains information that is passed on to the CGI program through the environment variable named QUERY_STRING, as required by the CGI specification.

The URL syntax is a subset of a more general syntax known as the uniform resource identifier (URI), which has the following form: <scheme>:<scheme-specific-data>. The data portion of a URI consists of all the characters to the right of the scheme name and the colon. There really are no restrictions on the structure of the scheme-specific data, although certain special characters may need to be encoded. For example, the string mailto:user@host is an example of a commonly used URI in web documents. The URL is just a special case of a URI where the data portion has the hierarchical structure described above.

One may loosely think of URIs as being analogous to the UNIX command line. The scheme portion is like the command (or action) name, and the data portion of the URI contains the argument to the command. This analogy is useful because it somewhat describes how Protozilla implements new URI schemes. In this analogy, the command http essentially instructs the browser to download a document using the HTTP protocol from the specified host and display it in the browser window.

A browser comes bundled with a small "vocabulary" of handlers for predefined URI schemes like HTTP and FTP. Protozilla allows the user to extend this vocabulary by defining new URI schemes or by overriding some predefined schemes.

Protozilla Overview

To use Protozilla, first you need to install it. This is a simple process, similar to installing a browser plugin. You visit the download page on the Protozilla web site using Mozilla/Netscape 6, and click on a button to install the Linux version. After installation, you need to restart the browser and open the Protozilla configuration window from the Tasks menu of the browser. This window lists the sample protocols that come bundled with Protozilla.

The configuration window also displays the path to the Protocols directory, i.e., the directory where Protozilla stores all its protocol handlers. On UNIX/Linux systems, this directory has a name such as $HOME/.mozilla/default/<cookie>.slt/protozilla/protocol, where "default" denotes your Mozilla profile name, and <cookie> denotes a random string.

A protocol handler is simply an executable file in the Protocols directory. Defining a new protocol is as simple as creating a file in this directory. You can create this file using your favorite editor, drag-and-drop the file from the desktop into the Protozilla configuration window or use the Create menu option in the configuration window. The protocol name is simply the filename, excluding any file extension. For example, if the Perl script "foo.pl" is present in the Protocols directory, then the protocol "foo:" is automatically registered. When a URI using this protocol needs to be loaded, Protozilla executes the script foo.pl, much like a CGI program, and displays the standard output from the script in the browser window.

Before displaying the standard output from the protocol handler, Protozilla searches for headers conforming to the multipurpose internet mail extensions (MIME) format, as defined by the CGI specification. We will not get into the details of what exactly MIME headers are. For our purposes, it suffices to say that if the standard output contains an HTML document, the MIME header consists of the line "content-type: text/html" followed by a blank line. The HTML document would follow this MIME header. For plain text output, the header would contain the line "content-type: text/plain". If Protozilla does not find a valid MIME header, it assumes the output to be plain text and displays it as such in the browser.

Using Protozilla to Implement a New URI Scheme

To illustrate how to use Protozilla, let us implement a simple URI scheme called "whois", which will allow us to access the internet domain registry database at whois.internic.net. We want the browser to recognize URIs of the form whois:<string>.

Clicking on the above URI in an HTML document (or typing it in the URL box of the browser) should load a page giving details of all domains that contain the <string> element. To implement this URI scheme, we would like to use the standard command named whois available on Linux systems. This command takes a string argument, searches the registry database and prints out the search results to the standard output. How do we tell the browser to use this command whenever it encounters a whois: URI?

The simplest way to implement the whois: scheme is to create an executable file named whois.sh in the Protocols directory, containing the following two lines:

#!/bin/sh
whois $URI_DATA

The environment variable URI_DATA is initialized to the data portion of the URI before Protozilla executes whois.sh. You can create this file using the Create menu option in the Protozilla configuration window.

After creating whois.sh, type the URI whois:linuxjournal.com in the browser's URL box. This will cause the registry information for linuxjournal.com to be displayed in the browser window.

If the whois command finds multiple matches in the registry database, then it simply lists each of the matching strings, without providing further information. For example, typing the URI whois:linuxjournal in the browser URL box causes the following matching strings to be listed:

LINUXJOURNAL.ORG
LINUXJOURNAL.NET
LINUXJOURNAL.COM

To obtain more information about one of these matches, you need to type in a new whois URI. This illustrates one of the deficiencies with the above simple implementation--it does not use the hypertext capabilities of the browser at all. Another problem with the implementation is that using shell expansion in a script opens up possible security holes.

In the case of multiple matches in the whois output, one would like to be able to simply click on one of the displayed matches to select it, rather than having to type in a new URI. This means that the protocol handler should output an HTML document with clickable URIs, rather than just a plain text document. To add this capability, and to address the security issues, we create a more sophisticated protocol handler using Perl called whois.pl (see Listing 1). Before you create this script in the Protocols directory, remember to first delete the old whois.sh script.

Listing 1. whois.pl

If you know something about CGI scripts, this listing will seem very familiar. The only deviation from the CGI standard is the use of the Protozilla-specific environment variable URI_DATA to obtain the data portion of the URI. The -T Perl option enables taint-checking and makes the script more secure.

The script in Listing 1 outputs an HTML document using MIME headers. In the case of multiple matches, the script takes each matching string and converts it to a clickable hyperlink. If you type the URI whois:linuxjournal, the browser will display an HTML document with three hyperlinks, one for each of the matches. To rerun the whois command for a particular matching string, all you need to do is to click on the link. This is an improvement over the command-line use of whois, where you would need to type in a new string explicitly, or cut-and-paste a string from the screen.

URL Redirection and Helper Applications

Protozilla can also be used to implement new URI schemes using URL redirection. For example, we may define a scheme called moz to access documents in the mozilla.org web site (i.e., the URI moz:docs gets redirected to the URL http://mozilla.org/docs). In effect, we would like the string http://mozilla.org/ to be prefixed to the data portion of any URI using the moz: scheme. This is very simple to implement using Protozilla's URL redirection feature. All you need to do is create a file named moz.url in the protocols directory containing the following line:

http://mozilla.org/

In the previous examples, we described how Protozilla could be used to implement protocols where the output is displayed in a browser window. There are situations where we would like a URI simply to trigger an action, such as launching a "helper" application window without using the browser interface. A common example is the Telnet protocol, where we would like a URI of the form telnet://hostname to fire up the Telnet client. To implement this using Protozilla, you create a file named telnet.cmd containing the following line:

telnet $URI_HOST $URI_PORT

Protocol handlers with the .cmd extension contain the command to be executed when the URI for the protocol is encountered. The environment variables URI_HOST and URI_PORT contain the hostname and port number information extracted by parsing the data portion of the URL.

Security Issues

There are some important security issues to consider when implementing new URI schemes using Protozilla. HTML documents downloaded from the Web using the browser could potentially contain hyperlinks using the new URI schemes. Loading of these hyperlinks could be triggered either by the user clicking on them or by Javascript executed in the web page. This means that the protocol handler should screen the data portion of the URI carefully for malicious input before carrying out the action requested by the URI. These security needs are quite similar to those that arise in the context of CGI programs executed by web servers. As illustrated in the example script whois.pl, scripting languages like Perl already provide mechanisms like taint-checking to address these needs.

Protozilla provides an additional security mechanism that allows the user to define restricted URI schemes. All schemes with names ending with a plus (+) character are automatically considered restricted. An example is the cgi+: scheme, bundled with Protozilla, which allows execution of CGI programs. A restricted URI may only be loaded by privileged scripts residing on the user's local disk or by the user typing the URI into a special user interface. Content downloaded from the Web is not allowed to trigger the loading of a restricted URI.

Technical Details

Here is a brief technical summary of Protozilla's internals. Mozilla communicates with the outside world using TCP sockets and multithreaded asynchronous I/O. What Protozilla does is execute an external program as a separate process but make it appear to Mozilla as if it were a TCP socket. Protozilla communicates with the process using anonymous pipes and synchronous I/O but uses asynchronous I/O to communicate with the rest of Mozilla. The core portion of Protozilla, which deals with pipes and inter-process communication, is written in C++. The rest of Protozilla, dealing with protocols, is implemented in Javascript.

Of course, an out-of-process implementation of a protocol is never going to be as efficient as a multithreaded, in-process implementation within Mozilla. However, there is a trade-off between efficiency and ease of implementation. Protozilla is better suited for experimental protocol implementations, rather than for mature protocols. Out-of-process CGI program execution can be quite slow on a web server serving a large number of clients, but it is not as serious a performance issue on a client that is serving just one user. An out-of-process implementation does have one advantage--if an experimental protocol implementation hangs or crashes, it will not affect the rest of the browser.

Applications of Protozilla

The main goal of the Protozilla project is to make it easy for the browser to interact with existing software, without the need for extensive modifications to conform to any browser-specific interface. The client-side CGI feature of Protozilla essentially allows any command-line program to be invoked within the browser. This feature may also be useful for building "local web applications" that use Mozilla as the user interface, rather than just as an HTTP client. For example, if you have a Perl script that parses your e-mail, you could choose to have it display its output in the browser.

Another goal of Protozilla is to allow seamless access to peer-to-peer (P2P) protocols from the browser. Since the browser is by far the most commonly used software for accessing remote content, implementing new URI schemes for P2P protocols should make them much more accessible to the average user. Not only will the user be able to access the protocols by typing a P2P URI in the browser's URL bar, but the clickable P2P URIs may also be embedded in e-mail messages and other documents and exchanged just like HTTP or FTP URIs. As an example of P2P applications, the Protozilla distribution comes bundled with a Freenet URI scheme. If you have the proxy server software for the P2P Freenet protocol installed on your computer, you will be able to use URIs of the form freenet:<key> to browse documents on Freenet.

Conclusion

Protozilla is aimed at promoting the integration between the browser and the operating system. This sort of integration has received a bad name in recent years because of the way in which it was first attempted on the Windows platform, leading to allegations of monopolistic practices. However, this integration is certainly desirable if done correctly, i.e., preserving modularity, using open standards and, even better, using open source. Protozilla is an open-source project whose goal is to become a standard feature of the open-source Mozilla browser eventually and thus make this integration available to every Mozilla user.

The Protozilla web site is located at protozilla.mozdev.org. It is sponsored by mozdev.org, which also hosts several other interesting projects based upon the concept of Mozilla as an application platform.

R. Saravanan has been an avid Linux user ever since Red Hat 3.0.3, which allowed him to replicate his UNIX work environment on his home PC.

email: saravn@mozdev.org

Load Disqus comments