Configuring, Tuning and Debugging Apache

RPMs, mod_perl, apachectl, telnet, mod_status, DSO, apxs, Apache::Status...Scared yet? Mr. Lerner tells us about all of these so we can better manage our web servers.
How Many Servers?

Even moderately popular web sites tend to run more than one copy of Apache at a time. Older servers would wait until a new connection came in before deciding whether to “fork” off a new copy of the existing server process. The authors of Apache abandoned this method in favor of “pre-forking”, meaning that Apache creates a large number of child processes immediately upon startup.

Each child process handles only one HTTP connection at a given time, meaning that there must always be at least as many Apache processes running as the number of simultaneous connections a site should handle. This maximum number is set with the MaxClients directive, which defaults to 150. If MaxClients is set too low, then one or more users may end up waiting until an Apache process finishes handling the previous connection, and is available to service a new one.

Apache dynamically changes the number of servers available depending on the number of requests that it gets, based on hints in httpd.conf. The MinSpareServers and MaxSpareServers directives tell Apache how many extra servers to keep around in preparation for incoming requests. If the number of spare servers ever goes below MinSpareServers, Apache spawns several new servers. By contrast, if there are more unused servers than defined in MaxSpareServers, Apache will kill off the extra ones.

If the server starts and responds successfully, but is taking a long time to accept connections, it could be that you have not told Apache to start enough servers. Try increasing either MaxSpareServers or MaxClients so that fewer people will have to wait for a free server to handle their request.

Of course, adding new servers is a potentially large drain on the computer's resources, consuming more CPU time and memory. mod_perl is a particularly large user of memory; so you should be more conservative when adding new Apache processes that include mod_perl. Use the standard Linux free command, which displays the amount of available physical and virtual memory, to get a better understanding of where the memory is going. I also like to use top, which displays, among other things, the amount of CPU and memory each process is consuming.

Because web servers need to respond to requests as quickly as possible, and because virtual memory is far slower than physical RAM, you should pay particularly close attention to virtual memory usage and minimize its use.

Small web sites that run one or more database servers (such as MySQL or PostgreSQL) on the same computer as a web server can find themselves in an unenviable bind. As the number of visitors to a dynamically generated site increases, the number of Apache processes must also increase. But in order to service all of these visitors, the number of database connections must also increase. At a certain point, a site becomes a victim of its own success, with the database and web site competing for system resources. For this reason, most popular database-backed sites separate the two functions onto at least two computers, with one or more database servers connected to one or more HTTP servers.

mod_status

One nice way to get a snapshot of the current Apache status is with the mod_status module. mod_status describes the current state of every HTTP server, whether it is waiting for a new connection, reading the request, handling the request or writing a response.

mod_status is compiled into Apache by default, meaning that all you need to do in order to activate it is to set the appropriate directives, and set the default handler, or request-handling subroutine, to be “server-status”. Any URL defined to have a handler of “server-status” then produces a status listing, ignoring the rest of the user's request.

It is thus most common for mod_status to be activated for only one URL. For example, we can create the virtual “/server-status” URL on our web server, such that anyone visiting /server-status will be shown the output from mod_status. We also indicate that Apache should always produce a full status listing, rather than the simple version. Here is one such simple configuration:

<Location /server-status>
    SetHandler server-status
</Location>
ExtendedStatus On

Once I put those four lines inside of httpd.conf and restart Apache—or send it a HUP signal—I get the following output from the /server-status URL:

Server Version: Apache/1.3.12 (UNIX) mod_perl/1.24
Server Built: Mar 29 2000 12:25:42
Current Time: Friday, 21-Jul-2000 16:02:51 IDT
Restart Time: Friday, 21-Jul-2000 16:02:48 IDT
Parent Server Generation: 2
Server uptime: 3 seconds
Total accesses: 0 - Total Traffic: 0 kB
CPU Usage: u0 s0 cu0 cs0
0 requests/sec - 0 B/second -
1 requests currently being processed, 4 idle servers
The status information begins with a fair amount of text indicating how long the server has been running, and how many times people have accessed the server. It also indicates just how many bytes are being served by this web process and how many servers are sitting idle. mod_status thus provides a nice window into the world of the Apache server, allowing us to see whether we have defined MaxSpareServers in the most resource-efficient manner.

mod_status then produces output in the following format, which can seem cryptic at first:

W____.............................................
 .................................................
 .................................................
 .................................................

Each “.” character represents a potential Apache server process which is currently not running. Those that are waiting for a new connection are represented by “_”; those that are reading input from the user's HTTP request are represented by “R”; and writing their output to the user's browser are represented by “W”. Not all letters may be visible at a given time; the current Apache status changes dynamically, and the output you see from mod_status will change to reflect that.

Following this display, we get a play-by-play view of what each active process is doing. We can see which connections are taking a long time to be processed, which connections are the most popular each month, and nearly any other facet.

Of course, it is normally a bad idea to open up your status information to the entire world. Luckily, you can use the “Order”, “Deny” and “Allow” directives to restrict access to a set of IP addresses or to an entire domain. For example:

<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Deny from all
    Allow from .lerner.co.il
</Location>

With the above configuration, mod_status will display results only for IP addresses in my domain. Requests coming from another domain will get an HTTP response indicating that access is forbidden to them.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix