Server-Side Includes

Don't want to learn CGI but still want dynamic web pages? Mr. Lerner introduces us to server-side includes.

Most web sites contain largely static HTML files jazzed up with some images. More advanced sites use one or more CGI programs, HTTP cookies, database backends and other topics we have discussed in previous months.

All of these techniques involve creating at least one new process on the web server. If we are interested in keeping our server's load as low as possible, we should avoid creating unnecessary processes. I am not suggesting we remove CGI programs—but I am saying there are times when CGI might be overkill.

Sometimes, for instance, you can do everything you need with server-side includes, also known as “SSIs”. SSIs offer a good balance between efficiency and complexity. Even if you have never seen them, server-side includes are pretty easy to understand. Indeed, they are often ideal for giving non-programmers a taste of dynamic page output without confusing them with the problems associated with actual programming. Here is one example of what they look like:

<!--#printenv -->

Yes, this looks like an HTML comment. But unlike an HTML comment, which is passed along to the user's browser unmodified, SSIs are parsed by the server before the file is sent off. It might be easier to think of SSIs as macros that are expanded by the web server.

Each server-side include begins with the “open comment” characters ("<!--"), a hash mark ("#") immediately following the two dashes, the name of the command you wish to evaluate, whitespace, zero or more attribute value pairs followed by whitespace and finally the “close comment” characters ("-->"). So the SSI command #printenv, which takes no arguments and returns the list of environment variables, could be contained in the following file:

<!--#printenv -->

Before sending the above document to a user, it would be expanded into something resembling Listing 1. Notice how our simple SSI was replaced by a list of environment variables and their values. While there is no standard for server-side includes, servers share a common list of SSIs with each server defining new ones.

This month, we will look into server-side includes—from configuring your server to allow for them, to some different SSI commands you might wish to use on your web site, to a number of ways in which to use SSIs on your site.

Configuring Apache for SSIs

Before you can actually create pages containing and using server-side includes, you must configure your web server to allow for them. If you are compiling Apache from scratch, make sure that mod_include, the module that takes care of SSIs, is compiled into the server. (By default, it should be.)

Even if mod_include exists, you must configure several additional items. First, you must tell Apache you wish to allow SSIs by using the Options directive in the server configuration file.

On my Red Hat 4.2 system, the file containing this information is called /etc/httpd/conf/access.conf, and the line looks like:

Options Indexes FollowSymLinks Includes

This indicates that I have decided to activate three of Apache's options—Indexes (producing a directory listing if a user asks for a directory rather than a file), FollowSymLinks (telling Apache to follow symbolic links, rather than ignoring them), and Includes (meaning server-side includes should be active).

If you would prefer to stop users from using #exec, which allows them to run arbitrary external programs, replace Includes with IncludesNOEXEC, as follows:

Options Indexes FollowSymLinks IncludesNOEXEC

If you wish to allow SSIs in only one directory, modify the configuration file so that the Options line appears between <Directory> and </Directory> lines. For example, if we only want files in the /ssi directory to allow for server-side includes, we could give this:

<Directory /home/httpd/html/ssi/>
Options Indexes FollowSymLinks Includes
The Apache documentation also describes how you can have several <Directory> blocks. If your server hosts several sub-sites, you can use them to define different services for different sub-sites, depending on who is running them, how much they have paid or what policies you wish to have in place.

We indicate which files might contain SSIs by using two additional directives in the srm.conf configuration file. The first, AddType, indicates what sort of content-type header should be sent when the server returns a document with an .shtml suffix. Browsers need to know how to interpret the data being sent to them—it could, after all, be an image in JPEG format, text in HTML format or completely unformatted data. We thus add the following line to our configuration file:

AddType text/html .shtml

However, that's not quite enough; we also want files to be parsed by the server on their way out the door. This is done by instructing Apache to use the “server-parsed” handler on all files with “.shtml” as the suffix. We can do this by adding the following line to the srm.conf file:

AddHandler server-parsed .shtml
and then restarting the server.

You might be wondering why we must use .shtml, rather than .html. Why not add a server-parsed handler for .html, and dispense with the separate extension?

The answer has to do with server efficiency. SSIs have less computational overhead than CGI programs, but less is not none. If we were to tell Apache that all HTML files might include SSIs, Apache would have to inspect every .html file, which might slow things down significantly. Thus, it is customary on many sites to divide files into two categories—those containing only HTML (with the .html suffix), and those containing HTML plus server-side includes (with the .shtml suffix). The only difference to users is the file extension they eventually see, since SSIs are replaced by their results before they are sent to the user's browser. Both are sent with a content-type of “text/html”, since our AddHandler directive instructs Apache to do so.