Why Application Servers Crash and How to Avoid It
These days, a well-tuned relational database (without deadlocks, no I/O contention, separate I/O paths for data and transaction log, etc.) might have a mean service time of 2, 5 or 10ms. That is in the same order of magnitude as the specs of the best disk drives themselves. And, that is also the level of performance announced in press releases from manufacturers on new TPC performance records: 30,000, 12,000 and 6,000 TPM (transactions per minute), respectively.
Table 3 shows what can be the highest volume of requests per hour while preserving a tolerable, yet slow response time (tq <= 20 seconds) for some typical mean service times.
You can see in this table that speed is a very valuable feature. With a very small mean service time (5ms or less), a web application system can support an important load without reaching the limit (<rho> = 1). Not only this, but the almost flat part of the curve in graphs, such as in Figures 1 and 2, is much longer. Also, Table 3 shows that a much longer mean service time (200, 500ms or more) allows a very limited number of requests per hour.
If the database back end easily can be as fast as 5 or 10ms, and therefore process roughly half a million transactions per hour, then why does a web application system die when it has not even one hundred thousand (or less)? In other words, if the database takes 5ms, why does the application server take 50 or 100ms (or more) to complete a request?
Once again, we are reminded of the keep it simple principle. As we will see, its antithesis, complexity, is a monster for which we will pay dearly.
How many tiers does the server side have? With a two-tier architecture, there is only a web server and a database server. With three-tier architecture, we have the two previous tiers and a third one, the application server, in between. Some installations have another tier: they split the work in two phases between two application servers in cascade, perhaps because it fits the Model-Controller-View principle of development methodology. Some other installations have yet another tier, a security server, which must answer access-control requests coming from the various servers in the system.
In all of these architectures, each additional server adds its own service time to the total mean service time of the simple model we have been looking at. Also, there is a communication delay between each server, whether a server is only another program on the same machine or another process on another host. Some of the modern communication protocols used to communicate between object-oriented languages are known to introduce significant overhead. In my experience, the smallest delay I have seen using a rather lightweight TCP protocol on the same machine was around 3ms. If you have many hops because you have three or four servers in cascade to serve one request, the sum of all these delays could easily be 20 or 30ms. And you pay that price (30ms) for each and every request, even if your last application server does a 5ms query to the database that takes only 5ms.
Then there is the question, how fast are the various application servers? Each vendor might have their claims. If you want the real answers, you have to let your application compute the elapsed time at the end of each request and write it to a message log. This will tell you what really happens with your hardware and software configuration. If you have such measurements for each server program, you will be able to identify bottlenecks quickly. The lightest application server I have seen used a few milliseconds to process any request. Your mileage will vary. It would be interesting to compile real-life statistics like this about various products as these numbers are as difficult to find as marble-sized gold nuggets in a stream.
Let's say we started with a high-performance RDBMS crunching at 5ms, and we end up with a web service that has a total average service time of 50ms. Well, that's not so bad actually, as we should be able to process 70,000 dynamic requests per hour.
And, let's say that the main application server of MyBestSport.com crashes at less than 10,000 requests per hour. Why? We know it happens often on evenings when there is a baseball or football game on TV, 15 minutes before the game (the load gets much lighter once the game begins).
This is very likely to happen if certain categories of requests take more time and tend to occur together, without being well distributed in time. This is not like the exponential distribution that we used in our model. Certain types of unequal distributions are especially painful. Let's look at this example (not a real service) in more detail.
Suppose the server side has four tiers: HTTP server, application server, security server and database server. Also, suppose the total mean service time is 50ms: 10ms to leave the HTTP server, cross a firewall and get to the application server; 25ms in the application server itself; 10ms to do an access-control request to the security server; and 5ms to do an SQL request.
But, all these measures are averages. We know from the log files that one server has a less equal distribution: the security server does the access control checks very quickly, in less than 4ms usually, but it takes normally about 500ms to perform a login check. The average is still less than 5ms because there are hundreds of access-control requests for each login request. When we look in the application log, we see that a large number of new logins were beginning before the application server ran out of RAM and started swapping, tied its shoelaces together and jumped out of the window.
Also, we know that the programming power offered by the very high-level, object-oriented features of the application server has its price: each incoming request uses at least 800KB even before it has been forwarded to the security server or the database server. Also, each session object (used to simulate a browser-to-app-server persistent connection, something that the HTTP protocol does not provide, per se) takes 200KB. After more analysis of the log, we find out that just before the crash there was an average of 200 logins per minute, much more than the average. Looking at the 15 minutes of log before the crash, we find about one thousand login requests and only a few logouts. The application server got bogged down with too many concurrent login requests, and it exhausted the RAM. The machine started paging out virtual memory, requests took too long, users got fed up with waiting and clicked on Stop, pressed Esc and tried logging in again until the operator pressed the big red button.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- New Products
- Flexible Access Control with Squid Proxy
- Users, Permissions and Multitenant Sites
- Security in Three Ds: Detect, Decide and Deny
- High-Availability Storage with HA-LVM
- Tighten Up SSH
- DevOps: Everything You Need to Know
- Non-Linux FOSS: MenuMeters
- Solving ODEs on Linux
- diff -u: What's New in Kernel Development