Configuring, Tuning and Debugging Apache
Even moderately popular web sites tend to run more than one copy of Apache at a time. Older servers would wait until a new connection came in before deciding whether to “fork” off a new copy of the existing server process. The authors of Apache abandoned this method in favor of “pre-forking”, meaning that Apache creates a large number of child processes immediately upon startup.
Each child process handles only one HTTP connection at a given time, meaning that there must always be at least as many Apache processes running as the number of simultaneous connections a site should handle. This maximum number is set with the MaxClients directive, which defaults to 150. If MaxClients is set too low, then one or more users may end up waiting until an Apache process finishes handling the previous connection, and is available to service a new one.
Apache dynamically changes the number of servers available depending on the number of requests that it gets, based on hints in httpd.conf. The MinSpareServers and MaxSpareServers directives tell Apache how many extra servers to keep around in preparation for incoming requests. If the number of spare servers ever goes below MinSpareServers, Apache spawns several new servers. By contrast, if there are more unused servers than defined in MaxSpareServers, Apache will kill off the extra ones.
If the server starts and responds successfully, but is taking a long time to accept connections, it could be that you have not told Apache to start enough servers. Try increasing either MaxSpareServers or MaxClients so that fewer people will have to wait for a free server to handle their request.
Of course, adding new servers is a potentially large drain on the computer's resources, consuming more CPU time and memory. mod_perl is a particularly large user of memory; so you should be more conservative when adding new Apache processes that include mod_perl. Use the standard Linux free command, which displays the amount of available physical and virtual memory, to get a better understanding of where the memory is going. I also like to use top, which displays, among other things, the amount of CPU and memory each process is consuming.
Because web servers need to respond to requests as quickly as possible, and because virtual memory is far slower than physical RAM, you should pay particularly close attention to virtual memory usage and minimize its use.
Small web sites that run one or more database servers (such as MySQL or PostgreSQL) on the same computer as a web server can find themselves in an unenviable bind. As the number of visitors to a dynamically generated site increases, the number of Apache processes must also increase. But in order to service all of these visitors, the number of database connections must also increase. At a certain point, a site becomes a victim of its own success, with the database and web site competing for system resources. For this reason, most popular database-backed sites separate the two functions onto at least two computers, with one or more database servers connected to one or more HTTP servers.
One nice way to get a snapshot of the current Apache status is with the mod_status module. mod_status describes the current state of every HTTP server, whether it is waiting for a new connection, reading the request, handling the request or writing a response.
mod_status is compiled into Apache by default, meaning that all you need to do in order to activate it is to set the appropriate directives, and set the default handler, or request-handling subroutine, to be “server-status”. Any URL defined to have a handler of “server-status” then produces a status listing, ignoring the rest of the user's request.
It is thus most common for mod_status to be activated for only one URL. For example, we can create the virtual “/server-status” URL on our web server, such that anyone visiting /server-status will be shown the output from mod_status. We also indicate that Apache should always produce a full status listing, rather than the simple version. Here is one such simple configuration:
<Location /server-status> SetHandler server-status </Location> ExtendedStatus On
Once I put those four lines inside of httpd.conf and restart Apache—or send it a HUP signal—I get the following output from the /server-status URL:
Server Version: Apache/1.3.12 (UNIX) mod_perl/1.24 Server Built: Mar 29 2000 12:25:42 Current Time: Friday, 21-Jul-2000 16:02:51 IDT Restart Time: Friday, 21-Jul-2000 16:02:48 IDT Parent Server Generation: 2 Server uptime: 3 seconds Total accesses: 0 - Total Traffic: 0 kB CPU Usage: u0 s0 cu0 cs0 0 requests/sec - 0 B/second - 1 requests currently being processed, 4 idle serversThe status information begins with a fair amount of text indicating how long the server has been running, and how many times people have accessed the server. It also indicates just how many bytes are being served by this web process and how many servers are sitting idle. mod_status thus provides a nice window into the world of the Apache server, allowing us to see whether we have defined MaxSpareServers in the most resource-efficient manner.
mod_status then produces output in the following format, which can seem cryptic at first:
W____............................................. ................................................. ................................................. .................................................
Each “.” character represents a potential Apache server process which is currently not running. Those that are waiting for a new connection are represented by “_”; those that are reading input from the user's HTTP request are represented by “R”; and writing their output to the user's browser are represented by “W”. Not all letters may be visible at a given time; the current Apache status changes dynamically, and the output you see from mod_status will change to reflect that.
Following this display, we get a play-by-play view of what each active process is doing. We can see which connections are taking a long time to be processed, which connections are the most popular each month, and nearly any other facet.
Of course, it is normally a bad idea to open up your status information to the entire world. Luckily, you can use the “Order”, “Deny” and “Allow” directives to restrict access to a set of IP addresses or to an entire domain. For example:
<Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from .lerner.co.il </Location>
With the above configuration, mod_status will display results only for IP addresses in my domain. Requests coming from another domain will get an HTTP response indicating that access is forbidden to them.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SourceClear Open
- SUSE LLC's SUSE Manager
- Managing Linux Using Puppet
- My +1 Sword of Productivity
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- Non-Linux FOSS: Caffeine!
- SuperTuxKart 0.9.2 Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide