Apache 2.0: The Internals of the New, Improved
Apache's developers have always emphasized the security, correctness and flexibility of the server. However, as of Apache 1.3, many efforts were directed towards bringing performance up to par with other high-end web servers were minimal. With the continuous explosive growth of web traffic, Apache 2.0 tries to improve its throughput without compromising its other qualities.
Web servers have several key performance determinants. Some of these factors include the amount of memory available to hold the document tree, disk bandwidth, network bandwidth and CPU cycles. In most cases, people add to or upgrade the hardware to improve the performance of their web servers. Nevertheless, with the explosive growth of the Internet and its increasingly important role in our lives, the traffic on the Internet is growing at over 100% every six months. Thus, the workload on the servers is increasing rapidly and these servers are very easily overloaded. Several options exist to overcome this problem, besides hardware upgrades or additions.
For very busy web servers, the kernel overhead of switching tasks and doing I/O becomes a problem. Apache provides a solution for the high traffic problem through the mod_mmap_static module. This module ties files into the virtual memory space and avoids the overhead of “open” and “read” system calls to pull them in from the filesystem. This process can result in a good increase in speed when the server has enough memory to cache the whole document tree.
Furthermore, to improve the performance and to serve more requests per second, administrators can run a specialized web server that handles simple requests and passes everything else on to Apache. Another approach that cuts the operating system overhead is to have a small HTTP server built into the kernel itself. These two approaches are discussed later (see HTTPD Accelerators).
Apache modules through version 1.3 wrote directly to the TCP connection back to the client. This arrangement was simple and efficient, but it lacked flexibility.
An example of this inflexibility would be secured transactions over SSL. To perform encrypted communications, the SSL module must intercept all traffic between the client and the handler module. With no abstraction layer in place, this was a difficult task made even more difficult by the cryptography laws of the 1990s that prohibited adding convenient hooks. Administrators wanting to run secure sites had the choice of applying inelegant patch sets to the Apache source or using a proprietary and perhaps incompatible binary distribution.
In Apache 2.0 (with APR), all I/O is done through abstract I/O layer objects. This arrangement allows modules to hook into each other's streams. It will then be possible for SSL to be implemented through the normal module interface rather than requiring special hooks. I/O layers also help internationalized sites by providing a standard place to do character set translation.
In addition, with later Apache releases, I/O layers may support a “most requested module” feature that will have one module filter the output of another. However, this may not happen with Apache 2.1.
The original reason for creating Apache 2.0 was to solve scalability problems. The first proposed solution was to have a hybrid web server, one that has both processes and threads. This solution provides the reliability that comes with not having everything in one process, combined with the scalability that threads provide. However, this approach has no perfect way to map requests to either a thread or a process.
On Linux, for instance, it is best to have multiple processes, each with multiple threads serving the requests. If a single thread dies, the rest of the server will continue to serve more requests and the server will not be affected. On platforms that do not handle multiple processes well, such as Windows, one process with multiple threads is required. On the other hand, platforms with no thread support had to be taken into account, and therefore it was necessary to continue with the Apache 1.3 method of preforking processes to handle requests.
The mapping issue can be handled in multiple ways, but the most desirable way is to enhance the module features of Apache. This was the reasoning behind introduction of multiple-processing modules (MPMs). MPMs determine how requests are mapped to threads or processes. The majority of users will never write an MPM or even know they exist. Each server uses a single MPM, and the correct MPM for a given platform is determined at compile time.
Currently, five options are available for MPMs. All of them, except possibly the OS/2 MPM, retain the parent/child relationships from Apache 1.3, which means that the parent process will monitor the children and make sure that an adequate number is running.
MPMs offer two important benefits:
1. Apache can support a wide variety of operating systems more cleanly and efficiently. In particular, the Windows version of Apache is now much more efficient, since mpm_winnt can use native networking features in place of the POSIX layer used in Apache 1.3. This benefit also extends to other operating systems that implement specialized MPMs.
2. The server can be customized better for the needs of the particular site. For example, sites that need a great deal of scalability can choose to use a threaded MPM, while sites requiring stability or compatibility with older software can use a “preforking” MPM. Additionally, special features like serving different hosts under different user IDs (perchild) can be provided.
The prefork MPM implements a non-threaded, preforking web server that handles request in a manner similar to the default behavior of Apache 1.3 on UNIX. A single control process is responsible for launching child processes that listen for connections and serve them as they arrive.
Apache always tries to maintain several spare or idle server processes, which are ready to serve incoming requests. In this way, clients do not need to wait for a new child process to be forked before their requests can be served.
The StartServers, MinSpareServers, MaxSpareServers and MaxServers (set in /etc/httpd.conf) regulate how the parent process creates children to serve requests. In general, Apache is self-regulating, so most sites do not need to adjust these directives from their default values. Sites that need to serve more than 256 simultaneous requests may need to increase MaxClients, while sites with limited memory may need to decrease MaxClients to keep the server from thrashing.
While the parent process is usually started as root under UNIX in order to bind to port 80, the child processes are launched by Apache as less-privileged users. The User and Group directives are used to set the privileges of the Apache child processes. The child processes must be able to read all the content that will be served but should have as few privileges as possible beyond that.
MaxRequestsPerChild controls how frequently the server recycles processes by killing old ones and launching new ones.
The PTHREAD MPM is the default for most UNIX-like operating systems. It implements a hybrid multi-process multi-threaded server. Each process has a fixed number of threads. The server adjusts to handle load by increasing or decreasing the number of processes.
A single control process is responsible for launching child processes. Each child process creates a fixed number of threads as specified in the ThreadsPerChild directive. The individual threads then listen for connections and serve them when they arrive. The PTHREAD MPM should be used on platforms that support threads and that possibly have memory leaks in their implementation. This may also be the proper MPM for platforms with user-land threads, although testing at this point is insufficient to prove this hypothesis.
When compiled with the DEXTER MPM, the server starts by forking a static number of processes that will not change during the life of the server. Each process will create a specific number of threads. When a request comes in, a thread will accept it and answer it. When a child process sees that too many of its threads are serving requests, it will create more threads and make them available to serve more requests (see Figure 2).

Figure 2. Dexter MPM Model
The DEXTER MPM should be used on most modern platforms capable of supporting threads. It will create a light load on the CPU while serving the most requests possible.
The WINNT MPM is the default for the Windows NT operating systems. It uses a single control process, which launches a single child process that in turn creates threads to handle requests.
The PERHILD MPM implements a hybrid multiprocess, multithreaded web server. A fixed number of processes create threads to handle requests. Fluctuations in load are handled by increasing or decreasing the number of threads in each process.
A single control process launches the number of child processes indicated by the NumServers directive at server startup. Each child process creates threads as specified in the StartThreads directive. The individual threads then listen for connections and serve them when they arrive.
An MPM must be chosen during the configuration phase and compiled into the server. Compilers are capable of optimizing many functions if threads are used, but only if they know that threads are being used. Because some MPMs use threads on UNIX and others don't, Apache will always perform better if the MPM is chosen at configuration time and built into Apache.
To choose the desired MPM, you need to use the argument --with-mpm= NAME with the ./configure script, where NAME is the name of the desired MPM (dexter, mpmt_beos, mpmt_pthread, prefork, pmt_os2, perchild).
Once the server has been compiled, one can determine which MPM was chosen by using % httpd -l. This command will list every module that is compiled into the server, including the MPM.
The following list identifies the default MPM for every platform:
<il>BeOS: mpmt_beos<il>OS/2: spmt_os2<il>UNIX: threaded<il>Windows: winnt
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- seo services in india
3 hours 15 min ago - For KDE install kio-mtp
3 hours 16 min ago - Evernote is much more...
5 hours 16 min ago - Reply to comment | Linux Journal
14 hours 2 min ago - Dynamic DNS
14 hours 36 min ago - Reply to comment | Linux Journal
15 hours 34 min ago - Reply to comment | Linux Journal
16 hours 24 min ago - Not free anymore
20 hours 26 min ago - Great
1 day 14 min ago - Reply to comment | Linux Journal
1 day 22 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Yes, that is really much
Yes, that is really much information at once..
Use of IIS increases
Thanks for this great article. Nowadays is seems, that the MS-Plattform IIS/ASP.NET/SQL-Server gets a bigger portion of the market each and every month. So maybe Apache needs a (major) update? Or what else is the reason?
Re..
Greetings, was a bit hard to read, because it was full of imput :P i really like to enjoy reading your journal...
what a article
I just discovered this article an the whole lot of information. Wow!
Thanks a lot for helping. I dont think it is this difficult to understand for german natives;-)
Best regards from Germany
Remo
Well Done
Well written article with links to the top resources.
Good Summary
It is a very intersting summary, a lot of new information for a beginnner, now I will try to translate it to german, so more useful for me ;-) If I will be ready (sorry, can need a time), I will send you translation..
and where is the german
and where is the german translation?
regards
Thanks for the various
Thanks for the various resources.
Information
thanks for posting those information! greetz
Yes, that is really much
Yes, that is really much information at once... :)
Greetz
Welt-Blick
Really much an hard to understand.....
...for us germans, not? :-)
But I´m sure your english is better.
yes
Yes it is... btw. great article
Apache
The Apache web server is one of the most important open source projects. Not only is it the world's most popular HTTP server but more importantly it is the reason that the server side of the Web is not dominated by Microsoft. We all know about Microsoft's strategy of 'extending' the 'commodity protocols' on which open source projects depend, as a way of denying open source an entry into the market (The Halloween Documents: http://www.opensource.org/halloween/). If Apache wouldn't have been people probably would have had to develop one version of their web pages for Netscape servers and one for Microsoft servers. Eventually Microsoft would have won the 'server war' as it won the 'browser war' but luckily this did not happen. Instead Apache changed the rules on the server side.
Yes!
This is a very insightful comment, from a very insightful post. I'm very new to the open source community or idea, and didn't know about Apache until – of course – today. This is fantastic. I think the world at large owes a lot to the open source community without even knowing it. Open source doesn't get the publicity it should. However, true people in the community don't mind that too much – that's the risk and the sacrifice you make when you enter into this community and try to make things better for everyone. So, to all those who are involved out there : thank you!
Great information
Thank you for posting this information. Greetz
Re: Apache 2.0: The Internals of the New, Improved
Phew! That was a lot of info! Good article...