Apache 2.0: The Internals of the New, Improved
The Apache Project is a collaborative software development effort aimed at creating a robust, commercial-grade and freely available source code implementation of an HTTP web server. The project is jointly managed by a group of volunteers located around the world, using the Internet and the Web to communicate, plan and develop the server and its related documentation. These volunteers are known as the Apache Group. In addition, hundreds of users have contributed ideas, code and documentation to the project.
According to the Netcraft web servers survey, Apache has been the most popular web server on the Internet since April 1996. This comes as no surprise due to its many characteristics, such as the ability to run on various platforms, its reliability, robustness, configurability and the fact that it is free and well-documented. Apache has many advantages over other web servers, such as providing full source code and an unrestrictive license. It is also full of features. For example, it is compliant with HTTP/1.1 and extensible with third-party modules, and it provides its own APIs to allow module writing. Other interesting features that have made it a popular web server include the capability to tailor specific responses to different errors, its support for virtual hosts, URL rewriting and aliasing, content negotiation and its support for configurable, reliable piped logs that allows users to generate logs in a format they want.
Apache 1.3 has been a well-performing web server, but it suffers a few drawbacks, such as its scalability on some platforms. For instance, according to Martin Pool, AIX processes are heavyweight, and a small AIX machine serving a few hundred concurrent connections can become heavily loaded. In such situations, using processes is not the most effective solution and a threaded web server is needed.
Furthermore, with the evolution of the requirements imposed on web servers, new functionalities like higher reliability, higher security and further performance are required. In response, web servers must evolve to satisfy these demands. Apache is no exception, and it continues its drive to become a more robust and a faster web server with its new 2.0 version (see sidebar).
Apache is renowned for its portability because it works on several platforms. However, having the same base code of Apache portable on so many platforms comes with a high price, which is the ease of maintenance. The Apache server has reached a point where porting it to additional platforms is becoming more complex. Therefore, in order to give Apache the flexibility it needs to survive in the future on more platforms, this problem had to be addressed and resolved. As a result, Apache will be able to use specialized APIs, where they are available, to provide improved performance, making it easy to port to new platforms.
Apache was intended initially to work on standard UNIX systems. However, its support for other platforms grew and the number of platforms supported affected the simplicity of the source code. One effect is that the code makes extensive use of conditional compilation to cope with platform peculiarities. Writing to a standard POSIX API is also undesirable on some platforms that provide substandard implementations or faster paths.
To solve these problems, Ryan Bloom is leading efforts to develop a solution, a layer called the Apache Portable Runtime (APR). The APR presents a standard programming interface for server applications and covers tasks such as file I/O, logging, mutual exclusion, shared memory and managing child processes and asynchronous I/O. APR shields the application from incompatibilities in the implementation of the standard, and thus it will use the most efficient way to achieve each function on each supported particular platform.
Another component that helps to resolve portability problems is Ralph Engelschall's MM library, which hides the details of setting up shared memory areas between processes and provides an interface similar to malloc to manipulate them.
The MM library is a two-layer abstraction library that simplifies the usage of shared memory between forked processes under UNIX platforms. On the first (lower) layer, it hides all platform-dependent implementation details (allocation and locking). When dealing with shared memory segments and on the second (higher) layer, it provides a high-level malloc(3)-style API for a convenient and well-known way to work with data-structures inside those shared memory segments.
The traditional Apache structure is based on a single parent process and a group of reusable children (see Figure 1). The parent reads the configuration and manages the pool of children. Each child at any time is either serving a single request or sleeping. Apache 1.x automatically regulates the size of the pool of children so that there are enough to cope with spikes in load without using too many resources to maintain idle processes. Busy children serve one request at a time on a single socket.
Figure 1. Traditional Apache Structure
Some web sites are heavily loaded and receive thousands of requests per minute or even per second. Traditionally TCP/IP servers fork a new child to handle incoming requests from clients. However, in the situation of a busy web site, the overhead of forking a huge number of children will simply suffocate the server. As a consequence, Apache uses a different technique. It forks a fixed number of children right from the beginning. The children service incoming requests independently, using different address spaces. Apache can dynamically control the number of children it forks based on current load.
This design has worked well and proved to be both reliable and efficient; one of its best features is that the server can survive the death of children and is also reliable. It is also more efficient than the canonical UNIX model of forking a new child for every request.
This traditional Apache design works well up to quite high loads on modern UNIX systems. On Linux in particular, context switches and forking new processes are cheap, and accordingly this simple design is nearly optimal. One drawback, however, of the isolation between processes is that they cannot easily share data, and consequently sharing session data across the server takes a little work.
Another approach is to serve each request in a separate thread: this is the model used by most NT-based web servers. Although this approach eliminates most of the protection between tasks, it allows the module programmer more flexibility and it can be faster on systems where threads are cheaper than processes, such as Windows NT and AIX.
Apache 2.0 introduces MPMs (multiple-processing modules) that hide the process model from most of the code. At runtime, Apache can be configured to use threads, processes, a hybrid of both or some other model. Modules can register new process models to suit their operating systems or the applications. One proposed example is to fork processes that run as different users to give increased security on machines that offer virtual hosts to multiple customers.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- New Products
- Trying to Tame the Tablet
- git-annex assistant
4 hours 27 min ago - direct cable connection
4 hours 50 min ago - Agreed on AirDroid. With my
5 hours 34 sec ago - I just learned this
5 hours 4 min ago - enterprise
5 hours 34 min ago - not living upto the mobile revolution
8 hours 26 min ago - Deceptive Advertising and
9 hours 1 min ago - Let\'s declare that you have
9 hours 2 min ago - Alterations in Contest Due
9 hours 3 min ago - At a numbers mindset, your
9 hours 4 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
Yes, that is really much
Yes, that is really much information at once..
Use of IIS increases
Thanks for this great article. Nowadays is seems, that the MS-Plattform IIS/ASP.NET/SQL-Server gets a bigger portion of the market each and every month. So maybe Apache needs a (major) update? Or what else is the reason?
Re..
Greetings, was a bit hard to read, because it was full of imput :P i really like to enjoy reading your journal...
what a article
I just discovered this article an the whole lot of information. Wow!
Thanks a lot for helping. I dont think it is this difficult to understand for german natives;-)
Best regards from Germany
Remo
Well Done
Well written article with links to the top resources.
Good Summary
It is a very intersting summary, a lot of new information for a beginnner, now I will try to translate it to german, so more useful for me ;-) If I will be ready (sorry, can need a time), I will send you translation..
and where is the german
and where is the german translation?
regards
Thanks for the various
Thanks for the various resources.
Information
thanks for posting those information! greetz
Yes, that is really much
Yes, that is really much information at once... :)
Greetz
Welt-Blick
Really much an hard to understand.....
...for us germans, not? :-)
But I´m sure your english is better.
yes
Yes it is... btw. great article
Apache
The Apache web server is one of the most important open source projects. Not only is it the world's most popular HTTP server but more importantly it is the reason that the server side of the Web is not dominated by Microsoft. We all know about Microsoft's strategy of 'extending' the 'commodity protocols' on which open source projects depend, as a way of denying open source an entry into the market (The Halloween Documents: http://www.opensource.org/halloween/). If Apache wouldn't have been people probably would have had to develop one version of their web pages for Netscape servers and one for Microsoft servers. Eventually Microsoft would have won the 'server war' as it won the 'browser war' but luckily this did not happen. Instead Apache changed the rules on the server side.
Yes!
This is a very insightful comment, from a very insightful post. I'm very new to the open source community or idea, and didn't know about Apache until – of course – today. This is fantastic. I think the world at large owes a lot to the open source community without even knowing it. Open source doesn't get the publicity it should. However, true people in the community don't mind that too much – that's the risk and the sacrifice you make when you enter into this community and try to make things better for everyone. So, to all those who are involved out there : thank you!
Great information
Thank you for posting this information. Greetz
Re: Apache 2.0: The Internals of the New, Improved
Phew! That was a lot of info! Good article...