Apache 2.0: The Internals of the New, Improved
As mentioned previously, web server performance can be improved in many ways. Besides upgrading the hardware running the server, HTTPD accelerators can be deployed. Users can either run a specialized web server to handle simple static requests and pass all other requests to Apache (or any other web server) or have a small HTTP server built into the kernel itself. Proofs of concept for both approaches are phhttpd (pointy-headed httpd) and khttpd (kernel HTTP dæmon).
phhttpd serves all requests from a single process and uses the “sendfile” system call to put most of the work back into the kernel, besides interpreting the HTTP protocol. phhttpd cannot run on its own, as it requires a backing full server that knows how to talk with phhttpd, such as Apache. The two servers establish a line of communication while running. phhttpd listens to all the incoming connections, and if it can't parse the request for whatever reason, it hands the connection over its line to Apache to process. phhttpd keeps an aggressive cache of content that doesn't change at each request. It uses this content to reduce the amount of processing that must be done per request. It also features a nonblocking event model that allows a single thread to serve many connections. The number of threads may be scaled to match the size of the hosting machine.
To cut out the operating system overhead, a small HTTP server can be placed into the kernel itself to respond to requests for static files. It runs from within the Linux kernel as a module, handles only static web pages and passes all requests for nonstatic information to a regular user space web server such as Apache. Static web pages are not complex to serve, but they are important because virtually all images are static, as are a large portion of the HTML pages. A regular web server has little added value for static pages; it is simply a “copy file to network” operation, and the Linux kernel is good at this.
ARIES (Advanced Research on Internet E-Servers) is a project that started at the Open Architecture Research Lab at Ericsson Research Canada in January 2000. It aimed to find and prototype the necessary technology to prove the feasibility of a clustered internet server that demonstrates telecom-grade characteristics using Linux and open-source software as the base technology. These characteristics feature guaranteed continuous availability, guaranteed response time, high scalability and high performance.
Building a high-performance and scalable system requires a web server that can keep up with hundreds of requests per second. The Apache web server is one of three web servers that we have tested. Our test system is a telecom grade 16-processor cluster targeted for carrier-class server applications. It was featured in the April issue of Linux Journal, where we discussed our experiments with the Linux Virtual Server.
In an upcoming article, we will discuss Apache's performance in more detail and evaluate the performance of several other web servers, such as Jigsaw and Tomcat to see how they compare to each other. However, to give you a sample of how Apache 2.0 performs, we installed Apache 2.08 on a PIII server node with 512 MB RAM and started to generate HTTP requests to it using WebBench, a web server's benchmarking tool (see Figure 3).
Apache 2.08 was able to respond to an average of 817 requests per second, servicing up to 104 concurrent clients. One important factor is that this server node is diskless and Apache was serving the documents from an NFS partition (which presented some limitations). In other tests, we noticed much better performance (more requests/second) when Apache was servicing documents sitting on the server disk itself.
Based on our tests, we believe that Apache is considerably faster, more stable and more full-featured than other web servers. We look forward to testing and experimenting with the 2.0 release version, which promises clean code, well-structured I/O layering and enhanced scalability.
The Apache project will continue to be an open-source project that keeps up with advances in the HTTP protocol and web developments in general. The people behind the project are open to suggestions for fixes and improvements, and they respond to needs of large volume providers as well as occasional users. Part of Apache's original success was due to the numerous tests conducted by both developers and users. The Apache Group maintains rigorous standards before releasing any new version of the server, and when bugs are reported, the developers release patches and new versions as soon as they are available. If you are an IT manager or a system administrator currently using Apache 1.3.x, I highly recommend upgrading to Apache 2.0 once the release version is available.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.