Why Application Servers Crash and How to Avoid It
Too often, the success of a web site brings its doom. After a good publicity blitz, or a healthy growth of its popularity, the number of users visiting a site becomes very large. Then, at certain point, these users witness painfully slow response time, then one crash, then another, etc. The site administrators add some RAM, more machines, faster processors, and the crashes keep coming. At some point, rumors spread, and the users stop visiting.
Interesting web sites all use dynamic content: an application server drives the show and is helped by an RDBMS (relational database management system) to store the data. Web servers don't use a lot of static HTML pages anymore, which is too bad because when they did, they did not crash so much. Why? And, why does the application server crash? We will look at a little bit of queuing theory to analyze various scenarios that lead to web site crashes.
Queuing theory is about using math (statistics) to model the behavior of a server system. The server is not necessarily a computer. Queuing theory can be used to model the waiting queue at a bank teller or the flow of cars at the entrance of a bridge. It is also a good tool to predict the performance of telecom and computing systems.
To simplify things, we will use the single-server queue model with exponential service times. The words single server basically mean that the systems we will model have only one database server (the number of processors in the machine is not relevant here). Also, the phrase exponential service times means that the response time is random but follows a known mean average with a standard deviation equal to that average. We have to use even more queuing theory buzzwords now and say that we will use the M/M/1 model. We have already said that here we have a single-server (the 1 in M/M/1) and that we assume exponential service times (the second M). The first M indicates that we expect random arrivals following a Poisson distribution. With that all said, we can then use a few neat and simple formulas. First, we need to name a few variables:
[Ed. note: so that everyone may see them, the Greek letters below are presented in text form.]
<lambda> = average number of requests per second
s = average service time for each request
<rho> = utilization: fraction of time that the server is busy
q = average number of requests in the server (waiting and being processed)
tq = average time a request spends in the server (response time as perceived by the user)
w = average number of requests waiting in the server
The first formula is:
<rho> = <lambda> * s
With this formula, we can find the maximum number of requests that the server could process per second:
<lambda>max = 1/s
We will see that peculiar things happen when the number of requests per second gets close to this maximum, <lambda>max.
A few more formulas:
q = <rho> / (1 - <rho>)
tq = s/(1 - <rho>)
w = <rho> 2 / (1 - <rho>)
If we use these formulas, we can see why a web site can become very slow. Let's use queuing theory to model a system with two tiers on the server: the web server and the database server. Also, let's ignore the overhead of the HTTP server as the database server is orders of magnitude slower. Let's say that the average service time of the system is equal to that of the database server.
In Figure 1, we suppose that the database server has an average service time for each request of 50ms (s = 0.05 second). We vary the number of requests per second <lambda> and put it on the X axis. The Y axis has the perceived response time (tq).
We see in Figure 1 that the curve is almost flat and then rises suddenly. This happens when the utilization <rho> approaches 1. But, there is more to it than just slow response time when a server becomes more and more busy. RAM becomes a problem as soon there is an important peak in the number of requests.
For the sake of simplicity within queuing theory, we usually assume that the waiting queue has an infinite size. In practice, that is where ugly things can happen: queues do not have an infinite size because memory is not infinite. Whether you have 1GB or 8GB of RAM, you eventually run out of it.
The next graph (Figure 2) is similar to the first one, but on the Y axis we put the number of requests still in the server (q).
What happens is that when the number of requests coming in reaches the point where the utilization <rho> is almost 1, the number of requests waiting in RAM grows toward infinity. If you have 1,000 requests in RAM and each takes only 100KB (that is roughly 100MB), you are probably okay. With s = 0.05 second (50ms), q = 1000 when you have 71,928 requests per hour. When you have 71,993 requests per hour, you will have 10,284 requests waiting in RAM. That's about 1GB RAM. Let's say you have 2GB--with six more requests per hour, 71,998, you get q = 35,999. That's about 3.5GB. Theoretically, your users would have received a response after 30 minutes (1,800 seconds). But very few will get any response because the machine is swapping to get some virtual memory. Since this increases the service time by a factor of 100 or 1,000, your server appears to the world to have died.
There is a human factor that makes such response time crises even worse. The users will not wait patiently for 30 seconds. They will click Stop or press Esc and resubmit their requests. This will only add more requests to the system.
Table 1 shows the increase of q and tq for a server with a 50ms service time under increasing loads.
But, what would have happened if the server had used 1MB or 2MB per concurrent request? Table 2 shows also the usage of RAM if the per request footprint is 100KB, 500KB, 1MB or 2MB. If all requests really had a service near the average of 50ms (if the distribution was truly exponential), nothing much would happen. The server would start swapping a few seconds earlier perhaps. That makes little difference. The curve only becomes steep near the saturation point when the utilization <rho> approaches 1. But, there are scenarios (see the section Why the Server Crashes, Even at Moderate Loads) where reality does not follow the theoretical model. In those cases, RAM usage does make a difference.
|Happy Birthday Linux||Aug 25, 2016|
|ContainerCon Vendors Offer Flexible Solutions for Managing All Your New Micro-VMs||Aug 24, 2016|
|Updates from LinuxCon and ContainerCon, Toronto, August 2016||Aug 23, 2016|
|NVMe over Fabrics Support Coming to the Linux 4.8 Kernel||Aug 22, 2016|
|What I Wish I’d Known When I Was an Embedded Linux Newbie||Aug 18, 2016|
|Pandas||Aug 17, 2016|
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Happy Birthday Linux
- Updates from LinuxCon and ContainerCon, Toronto, August 2016
- ContainerCon Vendors Offer Flexible Solutions for Managing All Your New Micro-VMs
- New Version of GParted
- What I Wish I’d Known When I Was an Embedded Linux Newbie
- Tor 0.2.8.6 Is Released
- NVMe over Fabrics Support Coming to the Linux 4.8 Kernel
- Blender for Visual Effects
- All about printf
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide