Most web sites contain largely static HTML files jazzed up with some images. More advanced sites use one or more CGI programs, HTTP cookies, database backends and other topics we have discussed in previous months.
All of these techniques involve creating at least one new process on the web server. If we are interested in keeping our server's load as low as possible, we should avoid creating unnecessary processes. I am not suggesting we remove CGI programs—but I am saying there are times when CGI might be overkill.
Sometimes, for instance, you can do everything you need with server-side includes, also known as “SSIs”. SSIs offer a good balance between efficiency and complexity. Even if you have never seen them, server-side includes are pretty easy to understand. Indeed, they are often ideal for giving non-programmers a taste of dynamic page output without confusing them with the problems associated with actual programming. Here is one example of what they look like:
Yes, this looks like an HTML comment. But unlike an HTML comment, which is passed along to the user's browser unmodified, SSIs are parsed by the server before the file is sent off. It might be easier to think of SSIs as macros that are expanded by the web server.
Each server-side include begins with the “open comment” characters ("<!--"), a hash mark ("#") immediately following the two dashes, the name of the command you wish to evaluate, whitespace, zero or more attribute value pairs followed by whitespace and finally the “close comment” characters ("-->"). So the SSI command #printenv, which takes no arguments and returns the list of environment variables, could be contained in the following file:
<HTML> <Head><Title>Testing</Title></Head> <Body> <H1>Testing</H1> <!--#printenv --> </Body> </HTML>
Before sending the above document to a user, it would be expanded into something resembling Listing 1. Notice how our simple SSI was replaced by a list of environment variables and their values. While there is no standard for server-side includes, servers share a common list of SSIs with each server defining new ones.
This month, we will look into server-side includes—from configuring your server to allow for them, to some different SSI commands you might wish to use on your web site, to a number of ways in which to use SSIs on your site.
Before you can actually create pages containing and using server-side includes, you must configure your web server to allow for them. If you are compiling Apache from scratch, make sure that mod_include, the module that takes care of SSIs, is compiled into the server. (By default, it should be.)
Even if mod_include exists, you must configure several additional items. First, you must tell Apache you wish to allow SSIs by using the Options directive in the server configuration file.
On my Red Hat 4.2 system, the file containing this information is called /etc/httpd/conf/access.conf, and the line looks like:
Options Indexes FollowSymLinks Includes
This indicates that I have decided to activate three of Apache's options—Indexes (producing a directory listing if a user asks for a directory rather than a file), FollowSymLinks (telling Apache to follow symbolic links, rather than ignoring them), and Includes (meaning server-side includes should be active).
If you would prefer to stop users from using #exec, which allows them to run arbitrary external programs, replace Includes with IncludesNOEXEC, as follows:
Options Indexes FollowSymLinks IncludesNOEXEC
If you wish to allow SSIs in only one directory, modify the configuration file so that the Options line appears between <Directory> and </Directory> lines. For example, if we only want files in the /ssi directory to allow for server-side includes, we could give this:
<Directory /home/httpd/html/ssi/> Options Indexes FollowSymLinks Includes </Directory>The Apache documentation also describes how you can have several <Directory> blocks. If your server hosts several sub-sites, you can use them to define different services for different sub-sites, depending on who is running them, how much they have paid or what policies you wish to have in place.
We indicate which files might contain SSIs by using two additional directives in the srm.conf configuration file. The first, AddType, indicates what sort of content-type header should be sent when the server returns a document with an .shtml suffix. Browsers need to know how to interpret the data being sent to them—it could, after all, be an image in JPEG format, text in HTML format or completely unformatted data. We thus add the following line to our configuration file:
AddType text/html .shtml
However, that's not quite enough; we also want files to be parsed by the server on their way out the door. This is done by instructing Apache to use the “server-parsed” handler on all files with “.shtml” as the suffix. We can do this by adding the following line to the srm.conf file:
AddHandler server-parsed .shtmland then restarting the server.
You might be wondering why we must use .shtml, rather than .html. Why not add a server-parsed handler for .html, and dispense with the separate extension?
The answer has to do with server efficiency. SSIs have less computational overhead than CGI programs, but less is not none. If we were to tell Apache that all HTML files might include SSIs, Apache would have to inspect every .html file, which might slow things down significantly. Thus, it is customary on many sites to divide files into two categories—those containing only HTML (with the .html suffix), and those containing HTML plus server-side includes (with the .shtml suffix). The only difference to users is the file extension they eventually see, since SSIs are replaced by their results before they are sent to the user's browser. Both are sent with a content-type of “text/html”, since our AddHandler directive instructs Apache to do so.
Now that we have told Apache how to handle SSIs, we can start to use them in our files. Once again, only those files with .shtml suffixes will be parsed by Apache's server-side include mechanism, so make sure to save your files with an .shtml suffix, rather than with an .html suffix.
As we saw above, server-side includes look like HTML comments. This means any server-side includes not parsed by Apache will be invisible to the end user. Even if the SSI is passed unmodified to the user's browser (because of an error or a misconfigured server), there won't be any problems or oddities in what the user sees.
One of the most common uses of SSIs is to indicate when a document was modified. This is useful when a page is updated regularly; a typical example might be a news service or events calendar.
Here is a file indicating its latest modification date. We print the date with the SSI #echo command, which prints the value of an SSI variable. SSI variables include environment variables, plus several others defined by Apache. In this example, we look at LAST_MODIFIED, which contains the date and time of when the file was changed:
<HTML> <Head><Title>I was modified</Title></Head> <Body> <H1>I was modified</H1> <P>I was last modified on <!--#echo var="LAST_MODIFIED" --> </P> </Body> </HTML>
If you have followed the instructions so far, retrieving this page should indicate when it was last modified. Remember to save the file with an extension of .shtml—while writing this column, I spent some time trying to figure out why one particular SSI wasn't working. As it turns out, the problem was with the file extension, not my server.
The date printed by #echo might look nice to programmers, but it is a little daunting for most people. Non-programmers would prefer a slightly more familiar date and time format.
Fortunately, Apache allows us to modify the way in which dates are displayed. C programmers are probably familiar with the strftime function, which allows for the creation of many different time and date strings by using characters preceded by percent signs (%). For example, “%A” gives us the name of a day, “%B” returns the name of a month, “%d” gives the day of the month and “%Y” returns the four-digit year. Thus by specifying “%A, %d %B %Y”, we can get a string that looks like “Wednesday, February 22 1998”.
Here is an example of setting the date to American format, first using the “config” SSI, and then using the “echo” SSI to display the results in our new format.
<HTML> <Head><Title>Testing</Title></Head> <Body> <H1>Testing</H1> <!--#config timefmt="%m/%d/%y" --> <P>In America, I was changed on <!--#echo var="LAST_MODIFIED"></P> </Body> </HTML>
Already, we can see a pattern in how server-side includes are defined and used. They consist of a keyword and then one or more attribute,value pairs, just as in this example:
<HTML> <Head><Title>Testing</Title></Head> <Body> <H1>Testing</H1> <!--#config timefmt="%m/%d/%y" --> <P>In America, I was changed on <!--#echo var="LAST_MODIFIED" --></P> <!--#config timefmt="%d/%m/%y" --> <P>In Europe, I was changed on <!--#echo var="LAST_MODIFIED" --></P> </Body> </HTML>
Printing a file's modification date is fine if you are running a continuously updated news service, but it does not have many other applications. By contrast, I find the file-inclusion SSI functions are extremely useful when designing sites.
The syntax is quite simple, as you can see from this example:
<HTML> <Head><Title>A basic template</Title></Head> <!--#include virtual="/fragments/header.htmlf" --> <P>This is the text of my page, sandwiched between two server-side includes.</P> <!--#include virtual="/fragments/footer.htmlf" --> </Body> </HTML>
Here, we use #include, with a single argument named “virtual”. Apache replaces the contents of this SSI with the contents of the named file. This might not seem all that useful, but consider how easy this makes it to create a site with a uniform look. The header.htmlf fragment could contain the standard <Body> tag, defining text and background colors, as well as putting a menu bar across the top of the page. By the same token, the footer.htmlf fragment could contain a copyright notice, smaller menu bar or information about the server.
What is the advantage? When you decide to add a new button to the menu bar or when the site's sponsors move to a new address, you only need to modify a single file. The changes propagate automatically through the rest of the site. This is easier than changing each individual page, and more efficient than creating the page with a CGI program. Just as you can avoid programming errors by putting repeated instructions into a subroutine, so too you can avoid typos and other potential problems by putting repeated information into HTML fragments that are imported with the “include” SSI.
If your site uses CGI programs to create dynamic pages, you might be tempted to include your standard headers and footers in the programs' output using #include. Unfortunately, because CGI programs and SSIs use different handlers, there isn't any way for this to work. If you decide to use HTML fragments as headers and footers, you might want to define some short subroutines that can be included in your CGI programs.
Also, because different handlers are used for SSIs (“server-parsed”) and CGI programs (“cgi-script”), you cannot include server-side includes in the output from CGI programs and expect them to be interpreted. If you decide to create a uniform look and feel for your site using HTML fragments (described below), any CGI programs you write will be able to include those fragments. If you write CGI programs in Perl, such a subroutine could look like Listing 2. Your CGI programs would then look like Listing 3. Now when you change header.htmlf or footer.htmlf, all output on the server—from HTML files and CGI programs alike—will immediately reflect the changes.
In case you are wondering, fragments are imported verbatim, and any SSIs they might contain are passed along as HTML comments. Assume we defined header.htmlf to be the following two-line fragment:
<P>This is the header.</P> <!--#printenv -->
If this fragment were retrieved directly through Apache, the #printenv SSI would print the current list of environment variables. But since header.htmlf is imported via a #include SSI, the #printenv function is sent to the user's browser uninterpreted. This might seem unnecessary, until you consider that allowing SSIs inside of included files might lead to infinite loops or other unexpected results.
One of the more interesting recent additions to server-side includes is a limited programming language allowing for the setting and testing of variables.
Setting variables is fairly simple; you can do it with the following syntax:
<!--#set var="varname" value="value" -->
You can see the results with #echo (for a specific list of variables) or #printenv (for all defined variables), as in the following example:
<HTML> <Head><Title>Setting variables</Title></Head> <!--#set var="pi" value="3.14159" --> <pre><!--#printenv --></pre> <P>pi = <!--#echo var="pi" --></P> <HR> <!--#set var="e" value="2.71828" --> <pre><!--#printenv --></pre> <P>e = <!--#echo var="e" --></P> </Body> </HTML>The above example also demonstrates how SSIs are interpreted in the same order as they appear in the file. The output from #printenv changes after each variable setting.
Setting variables is useful when used in conjunction with if-then statements. These statements can be used to create conditional text within HTML files without having to use CGI programs. The syntax is rather simple, for example:
<!--#if expr="$SERVER_PORT=80" --> <P>You are using server port 80</P> <!--#else --> <P>You are using a non-standard server port</P> <!--#endif -->
Note that the variable name in an #if statement must be preceded by a dollar sign, much as with shell scripts. The #else statement is optional, but the #endif is mandatory, indicating the end of the conditional text.
You can even perform pattern-matching within variables, using regular expressions, as in the following:
<HTML> <Head><Title>Browser check</Title></Head> <!--#if expr="$HTTP_USER_AGENT = /^Mozilla/" --> <P>You are using Netscape</P> <!--#else --> <P>You are using another browser</P> <!--#endif --> </Body> </HTML>
If the value of HTTP_USER_AGENT (normally set to a string identifying the user's browser) is set to
Mozilla/4.04 [en] (X11; I; Linux 2.0.30 i586; Nav)as is the case on my system, the above will evaluate to “true”, and thus print the first string. Otherwise, it will print the second string. In this way, you can create menus customized for each browser. For instance, you could make life easier for users of Lynx (a text-only browser) by giving them a separate menu structure that does not rely on images.
Server-side includes do not solve all problems—but what software does? Rather, SSIs were created so that non-programmers could create dynamic output. Over time, they have expanded to the point where they can now include conditional statements, which are a first step toward actual programming. As we have seen, though, programmers can benefit from many of SSI's features, especially when it comes to including simple information inside of pages of HTML, such as standard headers or a file's last modification date.
There are a number of other commands available from within SSIs, including #exec, which allows you to run a program and incorporate its output into a page of HTML. (You can also use #include to bring in the output from a CGI program, even if you use IncludesNOEXEC rather than Includes in the Apache configuration.)
In some cases, though, such simple server-side includes might not be enough. Over the next few months, we will look at several software packages that take the idea of server-side includes one step further, making a complete programming language available inside of HTML files without the need for CGI programs.