Access Information Through World Wide Web

The World Wide Web is designed to be easy to attach to. Not just by using Mosaic, Lynx, or Netscape to read other people's pages, but also by publishing your own information. Eric Kasten tells you how to start.
Basic Configuration

The /usr/local/WWW/config directory contains several example configuration files. The CERN server has a rich collection of options, including caching specifications, proxy support, and access control. This article will cover only a set of the basic options to get you started.

Listing 1 is a basic configuration file, which, with the following descriptions, you can modify to get your server up and running. This file should be created in /usr/local/WWW/config/ and can have any name you desire; however, here it will be called cern_httpd.conf. The default file name that the server will search for at startup is /etc/httpd.conf; however, this is easily overridden on the command line, or you can create a symbolic link between /usr/local/WWW/cern_httpd.conf and /etc/httpd.conf. I prefer to simply override the default, thus providing an obvious indication as to which configuration file is currently in use.

Upon examining listing 1 you will find a number of options set to various values. Note that a comment line can be added to a configuration file by using the shell script convention of placing a # in the first column.

# cern_httpd.conf
# An example httpd configuration file
ServerRoot /usr/local/WWW
Port           80
PidFile        httpd-pid
UserId         www
GroupId        wwwgroup
AccessLog      /var/log/httpd.access
ErrorLog       /var/log/httpd.error
LogFormat      Common
LogTime        LocalTime
UserDir        public_html
Welcome        welcome.html
Welcome        index.html
AlwaysWelcome  On
# enable/disable methods
Enable         GET
Enable         HEAD
Enable         POST
Disable        DELETE
Disable        PUT
# Rules
Exec    /cgi-bin/*      /usr/local/WWW/cgi-bin/*
Pass    /*              /usr/local/WWW/htdocs/*

Listing 1. WWW Configuration File

One of the first configuration options is ServerRoot. This option determines the directory which the server will use as the default root directory. This may be prepended to other option settings (such as PidFile) in the case that an absolute path is not specified. In our case, ServerRoot should be set to /usr/local/WWW.

The HostName directive should be set to the fully qualified, dot-separated host and domain name of the host your server will run on. This is necessary so that the server can properly construct references to itself. This option may also be used to specify a hostname alias to be used in constructing URLs, as opposed to the hostname which is returned by the system.

Port specifies the port the server will accept connections on. When a client (such as Mosaic or Netscape) retrieves a document from your server, it will contact your host at this port to make a request. Ports provide a fixed location at which to access a particular service on a host. Many ports have been defined universally to be the access point for certain services. You may examine the /etc/services file to discover some of the ports that have been reserved on your system. If you are setting up a WWW server which you want accessible to the general public, you should probably use port 80. This port has been established as the default port for providing a hypertext transport protocol (http) service.

The PidFile directive specifies the file in which the server should log the process id of the principle httpd server. This process id can be used to help locate the server in the event that you want to send it a signal. The path specified can be either an absolute path or a path relative to the ServerRoot. For example, setting PidFile to httpd-pid would cause the server to log its process id in the file httpd-pid in the ServerRoot directory.

The next options are UserId and GroupId. These options specify the user and group ids under which the server will execute. In this article, as pointed out earlier, I will be using www as the user and wwwgroup as the group for the server.

Next are a set of options which control the kind of information the server will log about access activity and errors. AccessLog and ErrorLog should be set to valid file names. These logs may be useful for providing insight on tuning or security. You may want to keep these in /var/log or in a directory specially created for your WWW server logs. LogFormat should be set to Common, thus indicating a logging format that is likely to be recognized by many of the tools available to help you process the log information. LogTime can be set to either GMT or LocalTime depending on how you want the log records time stamped.

UserDir specifies the name of the public HTML document directory under each user's home directory. The example specifies public_html as the directory the server should support. This means that a universal resource locator (URL) of the form is redirected to the directory ~username/public_html/. Each user on your system can then create a public_html directory where they can set up their own home pages or other publically available documentation.

There are several Welcome directives in the example. Welcome indicates which file should be presented when a URL is passed to the server where the path specifies only a directory. For instance, using the example configuration file, the URL would result in either Welcome.html, welcome.html, or index.html being retrieved from directory /usr/local/WWW/htdocs/docs/ on the server. The order that the Welcome directives appear in the configuration file determines the search order which the server will use for finding the welcome document. Only the first document found will be displayed. AlwaysWelcome should normally be set on. If this option is off, the server will differentiate between URLs specifying directories with and without a trailing /. With this option off, a URL directory without a trailing / will result in a directory listing being displayed instead of the welcome document.

The Enable and Disable directives specify which methods are enabled on the server. Methods are actions that may be conducted during client-server sessions. For instance, the GET method allows documents to be retrieved from the server while PUT allows documents to be written to the server. By default, GET, HEAD and POST are enabled and DELETE and PUT are disabled. I prefer to explicitly define the methods so as to clearly control the accesses that I wish to allow. It is usually best not to allow destructive methods, such as PUT or DELETE, since it is possible to accidentally allow insecure accesses to the server which may provide a method for an intruder to enter your system.

The last section of our example configuration file deals with rules. Rules are used to control the processing of URLs which are passed to the server. These rules may map URL strings to specific files or operations. In the example file, two rules are included to simplify URL construction and to specify an executable path for the CGI programs and scripts. The first rule is an Exec. This rule tells the server that URLs which contain the string /cgi-bin/ are to result in the execution of a CGI program or script in the physical directory /usr/local/WWW/cgi-bin/. Pass indicates a rule which causes all URLs starting with the string /to be mapped to the directory /usr/local/WWW/htdocs/. This is the directory where you should place your server's public HTML documents.