Apache 2.0: The Internals of the New, Improved

Better scalability, reliability, security and performance are part of the upgrades of the new Apache server.
Performance

Apache's developers have always emphasized the security, correctness and flexibility of the server. However, as of Apache 1.3, many efforts were directed towards bringing performance up to par with other high-end web servers were minimal. With the continuous explosive growth of web traffic, Apache 2.0 tries to improve its throughput without compromising its other qualities.

Web servers have several key performance determinants. Some of these factors include the amount of memory available to hold the document tree, disk bandwidth, network bandwidth and CPU cycles. In most cases, people add to or upgrade the hardware to improve the performance of their web servers. Nevertheless, with the explosive growth of the Internet and its increasingly important role in our lives, the traffic on the Internet is growing at over 100% every six months. Thus, the workload on the servers is increasing rapidly and these servers are very easily overloaded. Several options exist to overcome this problem, besides hardware upgrades or additions.

For very busy web servers, the kernel overhead of switching tasks and doing I/O becomes a problem. Apache provides a solution for the high traffic problem through the mod_mmap_static module. This module ties files into the virtual memory space and avoids the overhead of “open” and “read” system calls to pull them in from the filesystem. This process can result in a good increase in speed when the server has enough memory to cache the whole document tree.

Furthermore, to improve the performance and to serve more requests per second, administrators can run a specialized web server that handles simple requests and passes everything else on to Apache. Another approach that cuts the operating system overhead is to have a small HTTP server built into the kernel itself. These two approaches are discussed later (see HTTPD Accelerators).

I/O Layering

Apache modules through version 1.3 wrote directly to the TCP connection back to the client. This arrangement was simple and efficient, but it lacked flexibility.

An example of this inflexibility would be secured transactions over SSL. To perform encrypted communications, the SSL module must intercept all traffic between the client and the handler module. With no abstraction layer in place, this was a difficult task made even more difficult by the cryptography laws of the 1990s that prohibited adding convenient hooks. Administrators wanting to run secure sites had the choice of applying inelegant patch sets to the Apache source or using a proprietary and perhaps incompatible binary distribution.

In Apache 2.0 (with APR), all I/O is done through abstract I/O layer objects. This arrangement allows modules to hook into each other's streams. It will then be possible for SSL to be implemented through the normal module interface rather than requiring special hooks. I/O layers also help internationalized sites by providing a standard place to do character set translation.

In addition, with later Apache releases, I/O layers may support a “most requested module” feature that will have one module filter the output of another. However, this may not happen with Apache 2.1.

Multiple-Processing Modules (MPM)

The original reason for creating Apache 2.0 was to solve scalability problems. The first proposed solution was to have a hybrid web server, one that has both processes and threads. This solution provides the reliability that comes with not having everything in one process, combined with the scalability that threads provide. However, this approach has no perfect way to map requests to either a thread or a process.

On Linux, for instance, it is best to have multiple processes, each with multiple threads serving the requests. If a single thread dies, the rest of the server will continue to serve more requests and the server will not be affected. On platforms that do not handle multiple processes well, such as Windows, one process with multiple threads is required. On the other hand, platforms with no thread support had to be taken into account, and therefore it was necessary to continue with the Apache 1.3 method of preforking processes to handle requests.

The mapping issue can be handled in multiple ways, but the most desirable way is to enhance the module features of Apache. This was the reasoning behind introduction of multiple-processing modules (MPMs). MPMs determine how requests are mapped to threads or processes. The majority of users will never write an MPM or even know they exist. Each server uses a single MPM, and the correct MPM for a given platform is determined at compile time.

Currently, five options are available for MPMs. All of them, except possibly the OS/2 MPM, retain the parent/child relationships from Apache 1.3, which means that the parent process will monitor the children and make sure that an adequate number is running.

MPMs offer two important benefits:

1. Apache can support a wide variety of operating systems more cleanly and efficiently. In particular, the Windows version of Apache is now much more efficient, since mpm_winnt can use native networking features in place of the POSIX layer used in Apache 1.3. This benefit also extends to other operating systems that implement specialized MPMs.

2. The server can be customized better for the needs of the particular site. For example, sites that need a great deal of scalability can choose to use a threaded MPM, while sites requiring stability or compatibility with older software can use a “preforking” MPM. Additionally, special features like serving different hosts under different user IDs (perchild) can be provided.

The prefork MPM implements a non-threaded, preforking web server that handles request in a manner similar to the default behavior of Apache 1.3 on UNIX. A single control process is responsible for launching child processes that listen for connections and serve them as they arrive.

Apache always tries to maintain several spare or idle server processes, which are ready to serve incoming requests. In this way, clients do not need to wait for a new child process to be forked before their requests can be served.

The StartServers, MinSpareServers, MaxSpareServers and MaxServers (set in /etc/httpd.conf) regulate how the parent process creates children to serve requests. In general, Apache is self-regulating, so most sites do not need to adjust these directives from their default values. Sites that need to serve more than 256 simultaneous requests may need to increase MaxClients, while sites with limited memory may need to decrease MaxClients to keep the server from thrashing.

While the parent process is usually started as root under UNIX in order to bind to port 80, the child processes are launched by Apache as less-privileged users. The User and Group directives are used to set the privileges of the Apache child processes. The child processes must be able to read all the content that will be served but should have as few privileges as possible beyond that.

MaxRequestsPerChild controls how frequently the server recycles processes by killing old ones and launching new ones.

The PTHREAD MPM is the default for most UNIX-like operating systems. It implements a hybrid multi-process multi-threaded server. Each process has a fixed number of threads. The server adjusts to handle load by increasing or decreasing the number of processes.

A single control process is responsible for launching child processes. Each child process creates a fixed number of threads as specified in the ThreadsPerChild directive. The individual threads then listen for connections and serve them when they arrive. The PTHREAD MPM should be used on platforms that support threads and that possibly have memory leaks in their implementation. This may also be the proper MPM for platforms with user-land threads, although testing at this point is insufficient to prove this hypothesis.

When compiled with the DEXTER MPM, the server starts by forking a static number of processes that will not change during the life of the server. Each process will create a specific number of threads. When a request comes in, a thread will accept it and answer it. When a child process sees that too many of its threads are serving requests, it will create more threads and make them available to serve more requests (see Figure 2).

Figure 2. Dexter MPM Model

The DEXTER MPM should be used on most modern platforms capable of supporting threads. It will create a light load on the CPU while serving the most requests possible.

The WINNT MPM is the default for the Windows NT operating systems. It uses a single control process, which launches a single child process that in turn creates threads to handle requests.

The PERHILD MPM implements a hybrid multiprocess, multithreaded web server. A fixed number of processes create threads to handle requests. Fluctuations in load are handled by increasing or decreasing the number of threads in each process.

A single control process launches the number of child processes indicated by the NumServers directive at server startup. Each child process creates threads as specified in the StartThreads directive. The individual threads then listen for connections and serve them when they arrive.

An MPM must be chosen during the configuration phase and compiled into the server. Compilers are capable of optimizing many functions if threads are used, but only if they know that threads are being used. Because some MPMs use threads on UNIX and others don't, Apache will always perform better if the MPM is chosen at configuration time and built into Apache.

To choose the desired MPM, you need to use the argument --with-mpm= NAME with the ./configure script, where NAME is the name of the desired MPM (dexter, mpmt_beos, mpmt_pthread, prefork, pmt_os2, perchild).

Once the server has been compiled, one can determine which MPM was chosen by using % httpd -l. This command will list every module that is compiled into the server, including the MPM.

The following list identifies the default MPM for every platform:

<il>BeOS: mpmt_beos<il>OS/2: spmt_os2<il>UNIX: threaded<il>Windows: winnt

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Yes, that is really much

replica Rolex's picture

Yes, that is really much information at once..

Use of IIS increases

Tagesgeld André's picture

Thanks for this great article. Nowadays is seems, that the MS-Plattform IIS/ASP.NET/SQL-Server gets a bigger portion of the market each and every month. So maybe Apache needs a (major) update? Or what else is the reason?

Re..

JimB's picture

Greetings, was a bit hard to read, because it was full of imput :P i really like to enjoy reading your journal...

what a article

Afrika's picture

I just discovered this article an the whole lot of information. Wow!
Thanks a lot for helping. I dont think it is this difficult to understand for german natives;-)

Best regards from Germany

Remo

Well Done

Mario's picture

Well written article with links to the top resources.

Good Summary

Brautmode's picture

It is a very intersting summary, a lot of new information for a beginnner, now I will try to translate it to german, so more useful for me ;-) If I will be ready (sorry, can need a time), I will send you translation..

and where is the german

nitro's picture

and where is the german translation?

regards

Thanks for the various

Magree's picture

Thanks for the various resources.

Information

röntgen's picture

thanks for posting those information! greetz

Yes, that is really much

Welt-Blick's picture

Yes, that is really much information at once... :)

Greetz
Welt-Blick

Really much an hard to understand.....

Hochzeitsvideo -dler's picture

...for us germans, not? :-)
But I´m sure your english is better.

yes

Tom G.'s picture

Yes it is... btw. great article

Apache

Manny's picture

The Apache web server is one of the most important open source projects. Not only is it the world's most popular HTTP server but more importantly it is the reason that the server side of the Web is not dominated by Microsoft. We all know about Microsoft's strategy of 'extending' the 'commodity protocols' on which open source projects depend, as a way of denying open source an entry into the market (The Halloween Documents: http://www.opensource.org/halloween/). If Apache wouldn't have been people probably would have had to develop one version of their web pages for Netscape servers and one for Microsoft servers. Eventually Microsoft would have won the 'server war' as it won the 'browser war' but luckily this did not happen. Instead Apache changed the rules on the server side.

Yes!

online shopping's picture

This is a very insightful comment, from a very insightful post. I'm very new to the open source community or idea, and didn't know about Apache until – of course – today. This is fantastic. I think the world at large owes a lot to the open source community without even knowing it. Open source doesn't get the publicity it should. However, true people in the community don't mind that too much – that's the risk and the sacrifice you make when you enter into this community and try to make things better for everyone. So, to all those who are involved out there : thank you!

Great information

island's picture

Thank you for posting this information. Greetz

Re: Apache 2.0: The Internals of the New, Improved

Anonymous's picture

Phew! That was a lot of info! Good article...

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix