Apache 2.0: The Internals of the New, Improved

Better scalability, reliability, security and performance are part of the upgrades of the new Apache server.

The Apache Project is a collaborative software development effort aimed at creating a robust, commercial-grade and freely available source code implementation of an HTTP web server. The project is jointly managed by a group of volunteers located around the world, using the Internet and the Web to communicate, plan and develop the server and its related documentation. These volunteers are known as the Apache Group. In addition, hundreds of users have contributed ideas, code and documentation to the project.

According to the Netcraft web servers survey, Apache has been the most popular web server on the Internet since April 1996. This comes as no surprise due to its many characteristics, such as the ability to run on various platforms, its reliability, robustness, configurability and the fact that it is free and well-documented. Apache has many advantages over other web servers, such as providing full source code and an unrestrictive license. It is also full of features. For example, it is compliant with HTTP/1.1 and extensible with third-party modules, and it provides its own APIs to allow module writing. Other interesting features that have made it a popular web server include the capability to tailor specific responses to different errors, its support for virtual hosts, URL rewriting and aliasing, content negotiation and its support for configurable, reliable piped logs that allows users to generate logs in a format they want.

The Jump from 1.3 to 2.0

Apache 1.3 has been a well-performing web server, but it suffers a few drawbacks, such as its scalability on some platforms. For instance, according to Martin Pool, AIX processes are heavyweight, and a small AIX machine serving a few hundred concurrent connections can become heavily loaded. In such situations, using processes is not the most effective solution and a threaded web server is needed.

Furthermore, with the evolution of the requirements imposed on web servers, new functionalities like higher reliability, higher security and further performance are required. In response, web servers must evolve to satisfy these demands. Apache is no exception, and it continues its drive to become a more robust and a faster web server with its new 2.0 version (see sidebar).

Sidebar

Portability

Apache is renowned for its portability because it works on several platforms. However, having the same base code of Apache portable on so many platforms comes with a high price, which is the ease of maintenance. The Apache server has reached a point where porting it to additional platforms is becoming more complex. Therefore, in order to give Apache the flexibility it needs to survive in the future on more platforms, this problem had to be addressed and resolved. As a result, Apache will be able to use specialized APIs, where they are available, to provide improved performance, making it easy to port to new platforms.

Apache Portable Runtime

Apache was intended initially to work on standard UNIX systems. However, its support for other platforms grew and the number of platforms supported affected the simplicity of the source code. One effect is that the code makes extensive use of conditional compilation to cope with platform peculiarities. Writing to a standard POSIX API is also undesirable on some platforms that provide substandard implementations or faster paths.

To solve these problems, Ryan Bloom is leading efforts to develop a solution, a layer called the Apache Portable Runtime (APR). The APR presents a standard programming interface for server applications and covers tasks such as file I/O, logging, mutual exclusion, shared memory and managing child processes and asynchronous I/O. APR shields the application from incompatibilities in the implementation of the standard, and thus it will use the most efficient way to achieve each function on each supported particular platform.

Another component that helps to resolve portability problems is Ralph Engelschall's MM library, which hides the details of setting up shared memory areas between processes and provides an interface similar to malloc to manipulate them.

The MM library is a two-layer abstraction library that simplifies the usage of shared memory between forked processes under UNIX platforms. On the first (lower) layer, it hides all platform-dependent implementation details (allocation and locking). When dealing with shared memory segments and on the second (higher) layer, it provides a high-level malloc(3)-style API for a convenient and well-known way to work with data-structures inside those shared memory segments.

The traditional Apache structure is based on a single parent process and a group of reusable children (see Figure 1). The parent reads the configuration and manages the pool of children. Each child at any time is either serving a single request or sleeping. Apache 1.x automatically regulates the size of the pool of children so that there are enough to cope with spikes in load without using too many resources to maintain idle processes. Busy children serve one request at a time on a single socket.

Figure 1. Traditional Apache Structure

Some web sites are heavily loaded and receive thousands of requests per minute or even per second. Traditionally TCP/IP servers fork a new child to handle incoming requests from clients. However, in the situation of a busy web site, the overhead of forking a huge number of children will simply suffocate the server. As a consequence, Apache uses a different technique. It forks a fixed number of children right from the beginning. The children service incoming requests independently, using different address spaces. Apache can dynamically control the number of children it forks based on current load.

This design has worked well and proved to be both reliable and efficient; one of its best features is that the server can survive the death of children and is also reliable. It is also more efficient than the canonical UNIX model of forking a new child for every request.

This traditional Apache design works well up to quite high loads on modern UNIX systems. On Linux in particular, context switches and forking new processes are cheap, and accordingly this simple design is nearly optimal. One drawback, however, of the isolation between processes is that they cannot easily share data, and consequently sharing session data across the server takes a little work.

Another approach is to serve each request in a separate thread: this is the model used by most NT-based web servers. Although this approach eliminates most of the protection between tasks, it allows the module programmer more flexibility and it can be faster on systems where threads are cheaper than processes, such as Windows NT and AIX.

Apache 2.0 introduces MPMs (multiple-processing modules) that hide the process model from most of the code. At runtime, Apache can be configured to use threads, processes, a hybrid of both or some other model. Modules can register new process models to suit their operating systems or the applications. One proposed example is to fork processes that run as different users to give increased security on machines that offer virtual hosts to multiple customers.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Yes, that is really much

replica Rolex's picture

Yes, that is really much information at once..

Use of IIS increases

Tagesgeld André's picture

Thanks for this great article. Nowadays is seems, that the MS-Plattform IIS/ASP.NET/SQL-Server gets a bigger portion of the market each and every month. So maybe Apache needs a (major) update? Or what else is the reason?

Re..

JimB's picture

Greetings, was a bit hard to read, because it was full of imput :P i really like to enjoy reading your journal...

what a article

Afrika's picture

I just discovered this article an the whole lot of information. Wow!
Thanks a lot for helping. I dont think it is this difficult to understand for german natives;-)

Best regards from Germany

Remo

Well Done

Mario's picture

Well written article with links to the top resources.

Good Summary

Brautmode's picture

It is a very intersting summary, a lot of new information for a beginnner, now I will try to translate it to german, so more useful for me ;-) If I will be ready (sorry, can need a time), I will send you translation..

and where is the german

nitro's picture

and where is the german translation?

regards

Thanks for the various

Magree's picture

Thanks for the various resources.

Information

röntgen's picture

thanks for posting those information! greetz

Yes, that is really much

Welt-Blick's picture

Yes, that is really much information at once... :)

Greetz
Welt-Blick

Really much an hard to understand.....

Hochzeitsvideo -dler's picture

...for us germans, not? :-)
But I´m sure your english is better.

yes

Tom G.'s picture

Yes it is... btw. great article

Apache

Manny's picture

The Apache web server is one of the most important open source projects. Not only is it the world's most popular HTTP server but more importantly it is the reason that the server side of the Web is not dominated by Microsoft. We all know about Microsoft's strategy of 'extending' the 'commodity protocols' on which open source projects depend, as a way of denying open source an entry into the market (The Halloween Documents: http://www.opensource.org/halloween/). If Apache wouldn't have been people probably would have had to develop one version of their web pages for Netscape servers and one for Microsoft servers. Eventually Microsoft would have won the 'server war' as it won the 'browser war' but luckily this did not happen. Instead Apache changed the rules on the server side.

Yes!

online shopping's picture

This is a very insightful comment, from a very insightful post. I'm very new to the open source community or idea, and didn't know about Apache until – of course – today. This is fantastic. I think the world at large owes a lot to the open source community without even knowing it. Open source doesn't get the publicity it should. However, true people in the community don't mind that too much – that's the risk and the sacrifice you make when you enter into this community and try to make things better for everyone. So, to all those who are involved out there : thank you!

Great information

island's picture

Thank you for posting this information. Greetz

Re: Apache 2.0: The Internals of the New, Improved

Anonymous's picture

Phew! That was a lot of info! Good article...

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState