Book Excerpt: DevOps Troubleshooting: Linux Server Best Practices
Sluggish or Unavailable Web Server
Although configuration and permission problems are pretty well defined, probably one of the more common web server problems you will troubleshoot is nice and vague—the server seems slow to the point it may even be temporarily unavailable. Although a large number of root causes make this kind of problem, this section will guide you through some common causes for sluggish web servers along with their symptoms.High Load
One of the first things I check when a server is sluggish or temporarily unavailable is its load. If you haven’t already read through Chapter 2, read it to learn how to determine whether the server is suffering from high load, and if so, whether that high load is the result of your web server processes; if it is, you’ll learn how to determine whether the load is CPU, RAM, or I/O bound.
Once you have identified whether the load is high and that your web server processes are the issue, if the load is CPU-bound, then you will likely need to troubleshoot any CGIs, PHP code, and so on, that your web server executes to generate dynamic content. Go through your web server logs and attempt to identify which pages are being accessed during this high load period; then attempt to load them yourself (possibly on a test server if your main server is overloaded) to gauge how much CPU various dynamic pages consume.
If the load seems RAM-bound and you notice you are using more and more swap storage and may even completely run out of RAM, then you may be facing the dreaded web server swap death spiral. This shows up commonly in Apache prefork servers but could potentially show up in Apache worker or even Nginx servers in the right conditions. Essentially, when you configure your web server, you can configure the maximum number of web server instances the server will spawn in response to traffic. In Apache prefork, this is known as the MaxClient setting. When a server gets so much traffic that it spawns more web server processes than can fit in RAM, processes end up using the much slower swap space instead. This causes those processes to respond much more slowly than processes residing in RAM, which causes the requests to take longer, which in turn causes more processes to be needed to handle the current load until, ultimately, both RAM and swap are consumed.
To solve this issue, you will need to calculate how many web server processes can fit into RAM. First calculate how much RAM an individual web server process will take, then take your total RAM and subtract your operating system overhead. Then figure out how many Apache processes you currently can fit into the remaining free RAM without going into swap. You should then configure your web server so it never launches more processes than it can fit into RAM.
Of course, with modern dynamically generated web pages, setting this value can be a bit tricky. After all, some PHP scripts, for instance, use little RAM whereas others may use quite a bit. In circumstances like this, the best tactic is to look at all of the web server processes on a busy web server and attempt to gauge the maximum, minimum, and average amount of RAM a process consumes. Then you can decide whether to set the number of web servers according to the worst case (maximum amount of RAM) or the average case.
If your load is I/O bound, and the web server has a database back-end on the same machine, you might simply be saturating your disk I/O with database requests. Of course, if you followed the load troubleshooting guide from Chapter 2, you should have been able to identify database processes as the culprit instead of web server processes. In either case, you may want to consider either putting your database on a separate server, upgrading your storage speed, or going to Chapter 9 for more information on how to troubleshoot database issues. Even if the database server is on a separate machine, each web server process that is waiting on a response from the database over the network may still generate a high load average.
Otherwise, if the server is I/O bound but the problem seems to be coming from the web server itself and not the database, it could be that the software that powers your website running on the machine simply is saturating disk I/O with requests. Alternatively, if you have enabled reverse DNS resolution in your logs so that IP addresses are converted into hostnames, your web server processes could simply have to wait on each DNS query to resolve before it finishes its request.Server Status Pages
One of the other main places to look when diagnosing sluggish servers, other than troubleshooting high load on the system, is in the server status page. Earlier in the chapter we talked about how to enable and view the server status page in your web server. In cases of slow or unavailable web servers, this status page gives a nice overall view of the health of your web server. You not only see system load averages, you can also see how many processes are currently busy and what they are doing.
If, for instance, you see something like this,
$ curl http://localhost/server-status?auto . . . Scoreboard: WWWWWWWWWWWWWKWWWWWKWWWWWWWWWWWKWWWWWWWWWWWWWWWWWWWWWWWKKKKKKWKWWWWCWCWWWWWWKWW ____WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
you’ll know that this server is completely overloaded with requests. As you refresh this page, you may see a process open up every now and then, but clearly, just about every process is busy fulfilling a request. In this circumstance, you may just need to allow your web server to spawn more processes (if you can fit them in RAM), or, alternatively, it may be time to add another web server to help share the load.
Then again, if you see a scoreboard like the one shown earlier, but notice that your web server seems quite responsive, it could be that each web request is having to wait on something on the back end. Behavior like this can happen when an application server is overloaded with waiting requests (sometimes because, ultimately, the database server it depends on is overloaded), so although all the web server processes are busy, adding more wouldn’t necessarily help the issue—they would also still be waiting on the back end to respond.
On the other hand, you might see something like this:
Scoreboard: WW_W__W_W__W_K_W_W_K___WWWWW_WWKWWWW_WWWWWWWWWWWWWWWWWWKKKKKK_KW_.WC.CW_____K__ ____....................................................................................... ........................................................................................... ........................................................................................... ................................................
This is a server that has many processes to spare, both ones that are loaded into RAM and ones that are waiting to be loaded. If your server is sluggish but your scoreboard looks like this, then you are going to need to dig into your web server logs and try to identify which pages are currently being loaded. Ultimately you will want to identify which pages on your site are taking so long to respond, and then you’ll need to dig into that software to try to find the root cause. Of course, it could also simply be that your web server is underpowered for the software it’s running, and if so, it’s time to consider a hardware upgrade.
© Copyright Pearson Education. All rights reserved.
Kyle Rankin is a director of engineering operations in the San Francisco Bay Area, the author of a number of books including DevOps Troubleshooting and The Official Ubuntu Server Book, and is a columnist for Linux Journal.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- Geek Guide: The DevOps Toolbox
- Nmap—Not Just for Evil!
- Download "The DevOps Toolbox: Tools and Technologies for Scale and Reliability"
- High-Availability Storage with HA-LVM
- Resurrecting the Armadillo
- DNSMasq, the Pint-Sized Super Dæmon!
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- March 2015 Issue of Linux Journal: System Administration
- Localhost DNS Cache
- Days Between Dates: the Counting