Stress Testing an Apache Application Server in a Real World Environment

Testing procedures and hints so you can find out how much traffic your Web application system can support.

We've all had an experience in which the software is installed on the servers, the network is connected and the application is running. Naturally, the next step is to think, "I wonder how much traffic this system can support?" Sometimes the question lingers and sometimes it passes, but it always presents itself. So, how do we figure out how much traffic our server and application can handle? Can it handle only a few active clients or can it withstand a proper Slashdotting? To appreciate fully the challenges one faces in trying to answer these questions, we must first understand the dynamic application and how it works.

A traditional dynamic application has five main components: the application server, the database server, the application, the database and the network. In the open-source world, the application server usually is Apache. And, often, Apache is running on Linux.

The database server can be almost anything that can do the job; for most smaller applications, this tends to be MySQL. In this article, I highlight the open-source PostgreSQL server, which also runs on Linux.

The application itself can be almost anything that fits the project requirements. Sometimes it makes sense to use Perl, sometimes PHP, sometimes Java. It is beyond the scope of this article to determine the benefits or liability of a particular platform, but a firm understanding of the best tool for the job is necessary to plan properly for adequate performance in a running application.

The database itself can mean the difference between a maximum load of one user and 5,000 users. A bad schema can be the death of an application, while a good schema can make up for a multitude of other shortcomings.

The network tends to be the forgotten part of the equation, but it can be as detrimental as bad application code or a bad schema. A noisy network can slow intra-server communications dramatically. It also can introduce errors and other unknowns into communications that, in turn, have unknown results on the running code.

As you have probably guessed, finding where our optimal performance lies and pushing those limits is more than a minor challenge. Like the formula-one race car that runs with almost absolute technical efficiency, the five main components of the Web-based application determine whether the system can handle its load optimally. By looking at those components and measuring how they react under certain circumstances, we can use that data to better tune the system as a whole.

Introduction to Testing

To begin the testing, we need to create an environment that facilitates micro-management of the five components. Being as most enterprise class applications are based on large proprietary hardware configurations, setting up a testing configuration often is prohibitive in cost. But, one of the advantages of the open-source model is a lot of the configurations are based on commodity hardware. The commodity hardware configuration, therefore, is the basic assumption used throughout the testing setup. This is not to say that a setup based on large proprietary hardware is not as valid or that the methods outlined are not compatible; it simply is more expensive.

We first need to set up a testing network. For this we use three computers on a private network segment. The systems should be exact replicas of the servers going into production or ones that already exist in the production environment. This, in a simple sense, accounts for the application/Web server and the database server, with the third system being a traffic generator and monitor. These three computers are connected through a hub for testing, because the shared nature of the hub facilitates monitoring network traffic. A better but more expensive solution would replace the hub with a switch and introduce an Ethernet tap into the configuration. The testing network we use, though, is a fairly accurate representation of the network topology that exists in the DMZ or behind the firewall of a live network.

Accurately monitoring the activity of the network and the systems involved in serving the applications requires some software, the first of which is the operating system. In this article, I use Red Hat 7.3, although there are few Red Hat-isms that are specific to these setups and tests. To get the best performance from the server machines, it is a good idea to make sure only the most necessary services are running. On the application server, this list includes Apache and SSH (if necessary); on the database server the list normally includes PostgreSQL and SSH (again, if necessary). As a general preference, I like to make sure all critical services, including Apache, PostgreSQL and the kernel itself are compiled from source. The benefit of doing this is ensuring only the necessary options are activated and nothing extraneous is active and taking up critical memory or processor time.

On the application and database servers, a necessary component that should be included is the sysstat package. This package normally is installed on the default Red Hat installation. For other distributions, the sysmon package can be found here and compiled from source. Sysstat is a good monitoring tool for most activities, as it can display at a glance almost all of the relevant information about a running system, including network activity, system loads and much more. This package works by polling data at specified intervals and is useful for general system monitoring. For our tests, we run sysstat in a more active mode, from the command line--a topic discussed in more depth later in this article.

It is a good idea to be familiar with the tools collected in the sysstat package, especially the sar and sadc programs. The man page for both of these programs provides a wealth of details. One of the limitations of the sysstat package is it has a minimum data sampling duration of one second. In my experience with this type of testing, a one-second sample is adequate for assessing where problems begin to creep into the configuration.

As we move to a different testing tool, we also are moving to a different portion of our testing network, the network itself. One of the best tools for this task is tcpdump. Tcpdump is a general purpose network data collection tool and, like sysstat, is available in binary form for most distributions, as well as in source code from www.tcpdump.org.

About now you may be asking why we are looking at raw network data. On occasion, I have errors be introduced into the communications between servers. For instance, sometimes data packets can become mangled in transit. Raw network data, then, is a great resource to have to refer back to in the event of a problem that cannot be diagnosed easily.

Tcpdump could be an article unto itself due to the depth and complexity of the subject of networking as well as the program itself. Specific usage examples follow in the next section, in which the actual testing procedure is explained. For now, tcpdump should be installed on our traffic generator system.

The last major component we need for our testing is a piece of software named flood, which is written by the Apache Group and available at www.apache.org. Flood still is considered alpha software and, therefore, is not well documented. On-line support also is limited, as few people seem to use it.

To begin, we need to download the flood source. We can get the source from here. A nice and simple document on how to build the flood source can be found there as well. If the Web application to be tested runs over https, reading this document is a must.

In it's most simple form, the method to build the software is:

     tar -zxvf flood-0.x.tar.gz
     cd flood-0.x/
     ./buildconf
     ./configure --disable-shared
     make all

Flood is executed and run from its source directory using the newly created ./flood executable.

The "./flood" syntax is quite simple. It generally follows the format:

     ./flood configuration-file > output.file

The configuration file is where the real work and power of flood is revealed, and several example files are provided in the ./examples directory in the flood source. It is a good idea to have a working knowledge of their construction, as well as some knowledge of XML. See Listing 1 for an example configuration file.

Listing 1. Example Configuration File

The general form of the configuration file is:

     <flood>
     <urllist></urllist>
     <profile></profile>
     <farmer></farmer>
     <farm></farm>
     <seed></seed>
     </flood>

The <urllist> is where the specific URLs are placed that flood uses to step through and access the application. Due to the way flood processes these URLs under certain configurations, it is possible to simulate a complete session a visitor may make to the Web application.

The <profile> section is where specifics are set about how the file should be processed as well as which URLs should be used. This section uses several tags to define the behavior of the flood process. They are:

     <name> <description> <useurllist> <profiletype>
     <socket> <report> <verify_resp>

These seven 7 tags are relatively well defined in the configuration file examples. The other main sections--farmer, farm and seed--set the parameters of how many times to run through the list, how often and the seed number for easy test duplication.

A real world note about flood from my own experience: if the application has rigidly defined URLs that reference individual pages, the stock flood report is useful with little modification. If, however, the Web application uses a few pages that refresh depending on variables and change accordingly, as is the case with most dynamic Web applications, flood results can be difficult to use. In the latter case, flood's primary usefulness comes in the scripting of traffic to a test environment for the purpose of simulating traffic. It is important to understand the benefits and the shortcomings of any applications being used; testing a Web application is no different.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

./flood configuration-file

Meline's picture

./flood configuration-file *gt; output.file

That sikiş should have been > which would have given >, i.e.:

./flood configuration-file > output.file

Re: Stress Testing an Apache Application Server in a Real World

Anonymous's picture

please, i'm cannot run this command:

/usr/lib/sa/sadc 1 [# of seconds to report] outfile

where can i get sadc, please anybody ..

tq

Re: Stress Testing an Apache Application Server in a Real World

Anonymous's picture

You can get the links for download at the authors site http://perso.wanadoo.fr/sebastien.godard/

Re: Stress Testing an Apache Application Server in a Real World

Anonymous's picture

i'm cannot ru this command

/usr/lib/sa/sadc 1 [# of seconds to report] outfile

when can i get sadc? please help me

Re: Stress Testing an Apache Application Server in a Real World

Anonymous's picture

Hi,

I'm trying to GET an URL protect by user&password and I don't find any info about syntax I needed. I try to simulate a direct telnet to a web server but it doesn't work:

telnet [host] 80
GET http://host/index.html HTTP/1.1
Authorization: Basic [code]
Host: [host]

Any idea.
Thank's.

Re: Stress Testing an Apache Application Server in a Real World

Anonymous's picture

If its standard Basic auth that we're talking about then, the authorization header is just the username and password concatenated together and then base64 encoded so

perl -MMIME::Base64 -e 'print encode_base64("Aladdin:open sesame") . "
";'

will print the value QWxhZGRpbjpvcGVuIHNlc2FtZQ== which you would then put in your authorization header:

telnet [host] 80
GET /protected_document.html HTTP/1.0
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

[server response should come here]

Re: Stress Testing an Apache Application Server in a Real World

Anonymous's picture

Thank you for your answer, but maybe I'm not express very good.
I want the flood XML syntax to simulate this commands; so, how to pass user and password information in stress testing.

Thanks.

Typo

Anonymous's picture

./flood configuration-file *gt; output.file

That should have been > which would have given >, i.e.:

./flood configuration-file > output.file

Re: Typo

Anonymous's picture

I guess you meant

"That should have been > which would have given > ..."

Isn't not previewing a *****? ;-)

you mean: this should be

Anonymous's picture

you mean:
this should be &gt; which would > ...

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState