Why Application Servers Crash and How to Avoid It
These days, a well-tuned relational database (without deadlocks, no I/O contention, separate I/O paths for data and transaction log, etc.) might have a mean service time of 2, 5 or 10ms. That is in the same order of magnitude as the specs of the best disk drives themselves. And, that is also the level of performance announced in press releases from manufacturers on new TPC performance records: 30,000, 12,000 and 6,000 TPM (transactions per minute), respectively.
Table 3 shows what can be the highest volume of requests per hour while preserving a tolerable, yet slow response time (tq <= 20 seconds) for some typical mean service times.
You can see in this table that speed is a very valuable feature. With a very small mean service time (5ms or less), a web application system can support an important load without reaching the limit (<rho> = 1). Not only this, but the almost flat part of the curve in graphs, such as in Figures 1 and 2, is much longer. Also, Table 3 shows that a much longer mean service time (200, 500ms or more) allows a very limited number of requests per hour.
If the database back end easily can be as fast as 5 or 10ms, and therefore process roughly half a million transactions per hour, then why does a web application system die when it has not even one hundred thousand (or less)? In other words, if the database takes 5ms, why does the application server take 50 or 100ms (or more) to complete a request?
Once again, we are reminded of the keep it simple principle. As we will see, its antithesis, complexity, is a monster for which we will pay dearly.
How many tiers does the server side have? With a two-tier architecture, there is only a web server and a database server. With three-tier architecture, we have the two previous tiers and a third one, the application server, in between. Some installations have another tier: they split the work in two phases between two application servers in cascade, perhaps because it fits the Model-Controller-View principle of development methodology. Some other installations have yet another tier, a security server, which must answer access-control requests coming from the various servers in the system.
In all of these architectures, each additional server adds its own service time to the total mean service time of the simple model we have been looking at. Also, there is a communication delay between each server, whether a server is only another program on the same machine or another process on another host. Some of the modern communication protocols used to communicate between object-oriented languages are known to introduce significant overhead. In my experience, the smallest delay I have seen using a rather lightweight TCP protocol on the same machine was around 3ms. If you have many hops because you have three or four servers in cascade to serve one request, the sum of all these delays could easily be 20 or 30ms. And you pay that price (30ms) for each and every request, even if your last application server does a 5ms query to the database that takes only 5ms.
Then there is the question, how fast are the various application servers? Each vendor might have their claims. If you want the real answers, you have to let your application compute the elapsed time at the end of each request and write it to a message log. This will tell you what really happens with your hardware and software configuration. If you have such measurements for each server program, you will be able to identify bottlenecks quickly. The lightest application server I have seen used a few milliseconds to process any request. Your mileage will vary. It would be interesting to compile real-life statistics like this about various products as these numbers are as difficult to find as marble-sized gold nuggets in a stream.
Let's say we started with a high-performance RDBMS crunching at 5ms, and we end up with a web service that has a total average service time of 50ms. Well, that's not so bad actually, as we should be able to process 70,000 dynamic requests per hour.
And, let's say that the main application server of MyBestSport.com crashes at less than 10,000 requests per hour. Why? We know it happens often on evenings when there is a baseball or football game on TV, 15 minutes before the game (the load gets much lighter once the game begins).
This is very likely to happen if certain categories of requests take more time and tend to occur together, without being well distributed in time. This is not like the exponential distribution that we used in our model. Certain types of unequal distributions are especially painful. Let's look at this example (not a real service) in more detail.
Suppose the server side has four tiers: HTTP server, application server, security server and database server. Also, suppose the total mean service time is 50ms: 10ms to leave the HTTP server, cross a firewall and get to the application server; 25ms in the application server itself; 10ms to do an access-control request to the security server; and 5ms to do an SQL request.
But, all these measures are averages. We know from the log files that one server has a less equal distribution: the security server does the access control checks very quickly, in less than 4ms usually, but it takes normally about 500ms to perform a login check. The average is still less than 5ms because there are hundreds of access-control requests for each login request. When we look in the application log, we see that a large number of new logins were beginning before the application server ran out of RAM and started swapping, tied its shoelaces together and jumped out of the window.
Also, we know that the programming power offered by the very high-level, object-oriented features of the application server has its price: each incoming request uses at least 800KB even before it has been forwarded to the security server or the database server. Also, each session object (used to simulate a browser-to-app-server persistent connection, something that the HTTP protocol does not provide, per se) takes 200KB. After more analysis of the log, we find out that just before the crash there was an average of 200 logins per minute, much more than the average. Looking at the 15 minutes of log before the crash, we find about one thousand login requests and only a few logouts. The application server got bogged down with too many concurrent login requests, and it exhausted the RAM. The machine started paging out virtual memory, requests took too long, users got fed up with waiting and clicked on Stop, pressed Esc and tried logging in again until the operator pressed the big red button.
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- What's the tweeting protocol?
2 hours 23 min ago
- Keeping track of IP address
4 hours 14 min ago
- Roll your own dynamic dns
9 hours 28 min ago
- Please correct the URL for Salt Stack's web site
12 hours 39 min ago
- Android is Linux -- why no better inter-operation
14 hours 54 min ago
- Connecting Android device to desktop Linux via USB
15 hours 23 min ago
- Find new cell phone and tablet pc
16 hours 21 min ago
17 hours 50 min ago
- Automatically updating Guest Additions
18 hours 58 min ago
- I like your topic on android
19 hours 45 min ago