Why Application Servers Crash and How to Avoid It
Too often, the success of a web site brings its doom. After a good publicity blitz, or a healthy growth of its popularity, the number of users visiting a site becomes very large. Then, at certain point, these users witness painfully slow response time, then one crash, then another, etc. The site administrators add some RAM, more machines, faster processors, and the crashes keep coming. At some point, rumors spread, and the users stop visiting.
Interesting web sites all use dynamic content: an application server drives the show and is helped by an RDBMS (relational database management system) to store the data. Web servers don't use a lot of static HTML pages anymore, which is too bad because when they did, they did not crash so much. Why? And, why does the application server crash? We will look at a little bit of queuing theory to analyze various scenarios that lead to web site crashes.
Queuing theory is about using math (statistics) to model the behavior of a server system. The server is not necessarily a computer. Queuing theory can be used to model the waiting queue at a bank teller or the flow of cars at the entrance of a bridge. It is also a good tool to predict the performance of telecom and computing systems.
To simplify things, we will use the single-server queue model with exponential service times. The words single server basically mean that the systems we will model have only one database server (the number of processors in the machine is not relevant here). Also, the phrase exponential service times means that the response time is random but follows a known mean average with a standard deviation equal to that average. We have to use even more queuing theory buzzwords now and say that we will use the M/M/1 model. We have already said that here we have a single-server (the 1 in M/M/1) and that we assume exponential service times (the second M). The first M indicates that we expect random arrivals following a Poisson distribution. With that all said, we can then use a few neat and simple formulas. First, we need to name a few variables:
[Ed. note: so that everyone may see them, the Greek letters below are presented in text form.]
<lambda> = average number of requests per second
s = average service time for each request
<rho> = utilization: fraction of time that the server is busy
q = average number of requests in the server (waiting and being processed)
tq = average time a request spends in the server (response time as perceived by the user)
w = average number of requests waiting in the server
The first formula is:
<rho> = <lambda> * s
With this formula, we can find the maximum number of requests that the server could process per second:
<lambda>max = 1/s
We will see that peculiar things happen when the number of requests per second gets close to this maximum, <lambda>max.
A few more formulas:
q = <rho> / (1 - <rho>)
tq = s/(1 - <rho>)
w = <rho> 2 / (1 - <rho>)
If we use these formulas, we can see why a web site can become very slow. Let's use queuing theory to model a system with two tiers on the server: the web server and the database server. Also, let's ignore the overhead of the HTTP server as the database server is orders of magnitude slower. Let's say that the average service time of the system is equal to that of the database server.
In Figure 1, we suppose that the database server has an average service time for each request of 50ms (s = 0.05 second). We vary the number of requests per second <lambda> and put it on the X axis. The Y axis has the perceived response time (tq).
We see in Figure 1 that the curve is almost flat and then rises suddenly. This happens when the utilization <rho> approaches 1. But, there is more to it than just slow response time when a server becomes more and more busy. RAM becomes a problem as soon there is an important peak in the number of requests.
For the sake of simplicity within queuing theory, we usually assume that the waiting queue has an infinite size. In practice, that is where ugly things can happen: queues do not have an infinite size because memory is not infinite. Whether you have 1GB or 8GB of RAM, you eventually run out of it.
The next graph (Figure 2) is similar to the first one, but on the Y axis we put the number of requests still in the server (q).
What happens is that when the number of requests coming in reaches the point where the utilization <rho> is almost 1, the number of requests waiting in RAM grows toward infinity. If you have 1,000 requests in RAM and each takes only 100KB (that is roughly 100MB), you are probably okay. With s = 0.05 second (50ms), q = 1000 when you have 71,928 requests per hour. When you have 71,993 requests per hour, you will have 10,284 requests waiting in RAM. That's about 1GB RAM. Let's say you have 2GB--with six more requests per hour, 71,998, you get q = 35,999. That's about 3.5GB. Theoretically, your users would have received a response after 30 minutes (1,800 seconds). But very few will get any response because the machine is swapping to get some virtual memory. Since this increases the service time by a factor of 100 or 1,000, your server appears to the world to have died.
There is a human factor that makes such response time crises even worse. The users will not wait patiently for 30 seconds. They will click Stop or press Esc and resubmit their requests. This will only add more requests to the system.
Table 1 shows the increase of q and tq for a server with a 50ms service time under increasing loads.
But, what would have happened if the server had used 1MB or 2MB per concurrent request? Table 2 shows also the usage of RAM if the per request footprint is 100KB, 500KB, 1MB or 2MB. If all requests really had a service near the average of 50ms (if the distribution was truly exponential), nothing much would happen. The server would start swapping a few seconds earlier perhaps. That makes little difference. The curve only becomes steep near the saturation point when the utilization <rho> approaches 1. But, there are scenarios (see the section Why the Server Crashes, Even at Moderate Loads) where reality does not follow the theoretical model. In those cases, RAM usage does make a difference.
- Red Hat OpenStack Platform
- Tech Tip: Really Simple HTTP Server with Python
- Custom checks and notifications for Nagios
- Stepping into Science
- Linux Journal December 2016
- CORSAIR's Carbide Air 740
- A Better Raspberry Pi Streaming Solution
- Radio Free Linux
- The Tiny Internet Project, Part II
- OpenSSL Hacks
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide