The Sysadmin's Toolbox: sar
As someone who's been working as a system administrator for a number of years, it's easy to take tools for granted that I've used for a long time and assume everyone has heard of them. Of course, new sysadmins get into the field every day, and even seasoned sysadmins don't all use the same tools. With that in mind, I decided to write a few columns where I highlight some common-but-easy-to-overlook tools that make life as a sysadmin (and really, any Linux user) easier. I start the series with a classic troubleshooting tool: sar.
There's an old saying: "When the cat's away the mice will play." The same is true for servers. It's as if servers wait until you aren't logged in (and usually in the middle of REM sleep) before they have problems. Logs can go a long way to help you isolate problems that happened in the past on a machine, but if the problem is due to high load, logs often don't tell the full story. In my March 2010 column "Linux Troubleshooting, Part I: High Load" (http://www.linuxjournal.com/article/10688), I discussed how to troubleshoot a system with high load using tools such as uptime and top. Those tools are great as long as the system still has high load when you are logged in, but if the system had high load while you were at lunch or asleep, you need some way to pull the same statistics top gives you, only from the past. That is where sar comes in.
Enable sar Logging
sar is a classic Linux tool that is part of the sysstat package and should
be available in just about any major distribution with your regular package
manager. Once installed, it will be enabled on a Red Hat-based system, but
on a Debian-based system (like Ubuntu), you might have to edit
/etc/default/sysstat, and make sure that ENABLED is set to true. On a Red
Hat-based system, sar will log seven days of statistics by default. If you
want to log more than that, you can edit /etc/sysconfig/sysstat and change
the HISTORY option.
Once sysstat is configured and enabled, it will collect statistics about your system every ten minutes and store them in a logfile under either /var/log/sysstat or /var/log/sa via a cron job in /etc/cron.d/sysstat. There is also a daily cron job that will run right before midnight and rotate out the day's statistics. By default, the logfiles will be date-stamped with the current day of the month, so the logs will rotate automatically and overwrite the log from a month ago.
CPU Statistics
After your system has had some time to collect statistics, you can use the sar tool to retrieve them. When run with no other arguments, sar displays the current day's CPU statistics:
$ sar
. . .
07:05:01 PM CPU %user %nice %system %iowait %steal %idle
. . .
08:45:01 PM all 4.62 0.00 1.82 0.44 0.00 93.12
08:55:01 PM all 3.80 0.00 1.74 0.47 0.00 93.99
09:05:01 PM all 5.85 0.00 2.01 0.66 0.00 91.48
09:15:01 PM all 3.64 0.00 1.75 0.35 0.00 94.26
Average: all 7.82 0.00 1.82 1.14 0.00 89.21
If you are familiar with the command-line tool top, the above CPU statistics should look familiar, as they are the same as you would get in real time from top. You can use these statistics just like you would with top, only in this case, you are able to see the state of the system back in time, along with an overall average at the bottom of the statistics, so you can get a sense of what is normal. Because I devoted an entire previous column to using these statistics to troubleshoot high load, I won't rehash all of that here, but essentially, sar provides you with all of the same statistics, just at ten-minute intervals in the past.
RAM Statistics
sar also supports a large number of different options you can use to pull
out other statistics. For instance, with the -r option, you can see RAM
statistics:
$ sar -r
. . .
07:05:01 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbcommit %commit
. . .
08:45:01 PM 881280 2652840 75.06 355284 1028636
8336664 183.87
08:55:01 PM 881412 2652708 75.06 355872 1029024
8337908 183.89
09:05:01 PM 879164 2654956 75.12 356480 1029428
8337040 183.87
09:15:01 PM 886724 2647396 74.91 356960 1029592
8332344 183.77
Average: 851787 2682333 75.90 338612 1081838
8341742 183.98
Just like with the CPU statistics, here I can see RAM statistics from the past similar to what I could find in top.
Disk Statistics
Back in my load troubleshooting column, I referenced sysstat as the source
for a great disk I/O troubleshooting tool called iostat. Although that
provides
real-time disk I/O statistics, you also can pass sar the
-b option to
get disk I/O data from the past:
$ sar -b
. . .
07:05:01 PM tps rtps wtps bread/s bwrtn/s
. . .
08:45:01 PM 2.03 0.33 1.70 9.90 31.30
08:55:01 PM 1.93 0.03 1.90 1.04 31.95
09:05:01 PM 2.71 0.02 2.69 0.69 48.67
09:15:01 PM 1.52 0.02 1.50 0.20 27.08
Average: 5.92 3.42 2.50 77.41 49.97
I figure these columns need a little explanation:
-
tps: transactions per second. -
rtps: read transactions per second. -
wtps: write transactions per second. -
bread/s: blocks read per second. -
bwrtn/s: blocks written per second.
sar can return a lot of other statistics beyond what I've mentioned, but if
you want to see everything it has to offer, simply pass the
-A
option, which will return a complete dump of all the statistics it has for
the day (or just browse its man page).
Turn Back Time
So by default, sar returns statistics for the current day, but often you'll
want to get information a few days in the past. This is especially useful
if you want to see whether today's numbers are normal by comparing them to days
in the past, or if you are troubleshooting a server that misbehaved over
the weekend. For instance, say you noticed a problem on a server
today between 5PM and 5:30PM. First, use the
-s and -e options
to tell sar to display data only between the start
(-s) and end (-e) times
you specify:
$ sar -s 17:00:00 -e 17:30:00
Linux 2.6.32-29-server (www.example.net) 02/06/2012 _x86_64_
(2 CPU)
05:05:01 PM CPU %user %nice %system %iowait %steal %idle
05:15:01 PM all 4.39 0.00 1.83 0.39 0.00 93.39
05:25:01 PM all 5.76 0.00 2.23 0.41 0.00 91.60
Average: all 5.08 0.00 2.03 0.40 0.00 92.50
To compare that data with the same time period from a different day,
just use the -f option and point sar to one of the logfiles under
/var/log/sysstat or /var/log/sa that correspond to that day. For instance,
to pull statistics from the first of the month:
$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01
Linux 2.6.32-29-server (www.example.net) 02/01/2012 _x86_64_
(2 CPU)
05:05:01 PM CPU %user %nice %system %iowait %steal %idle
05:15:01 PM all 9.85 0.00 3.95 0.56 0.00 85.64
05:25:01 PM all 5.32 0.00 1.81 0.44 0.00 92.43
Average: all 7.59 0.00 2.88 0.50 0.00 89.04
You also can add all of the normal sar options when pulling from past logfiles, so you could run the same command and add the
-r argument to get RAM
statistics:
$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 -r
Linux 2.6.32-29-server (www.example.net) 02/01/2012 _x86_64_
(2 CPU)
05:05:01 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbcommit %commit
05:15:01 PM 766452 2767668 78.31 361964 1117696
8343936 184.03
05:25:01 PM 813744 2720376 76.97 362524 1118808
8329568 183.71
Average: 790098 2744022 77.64 362244 1118252
8336752 183.87
As you can see, sar is a relatively simple but very useful troubleshooting tool. Although plenty of other programs exist that can pull trending data from your servers and graph them (and I use them myself), sar is great in that it doesn't require a network connection, so if your server gets so heavily loaded it doesn't respond over the network anymore, there's still a chance you could get valuable troubleshooting data with sar.
Toolbox image via Shutterstock.com.
Kyle Rankin is a systems architect; and the author of DevOps Troubleshooting, The Official Ubuntu Server Book, Knoppix Hacks, Knoppix Pocket Reference, Linux Multimedia Hacks, and Ubuntu Hacks.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
3 hours 4 min ago - Nice article, thanks for the
13 hours 45 min ago - I once had a better way I
19 hours 31 min ago - Not only you I too assumed
19 hours 48 min ago - another very interesting
21 hours 41 min ago - Reply to comment | Linux Journal
23 hours 34 min ago - Reply to comment | Linux Journal
1 day 6 hours ago - Reply to comment | Linux Journal
1 day 6 hours ago - Favorite (and easily brute-forced) pw's
1 day 8 hours ago - Have you tried Boxen? It's a
1 day 14 hours ago



Comments
I am currently developing a
I am currently developing a project so this will definitely gonna help me out.
stair treads carpet
http://www.naturalhomerugs.com/
Great tool
I am a developer, passionate about Linux/Unix. This tool will help me a lot. Thanks.
Currently at the work's server, CentOS, I get: permission denied on sar. So I have to wait until I get home to play with it on Ubuntu.
Thanks
Now like vivienne westwood
Now like vivienne westwood jewellery,you can consider the vivienne westwood earrings, vivienne westwood necklace and vivienne westwood bracelet. Vivienne westwood accessories also includes vivienne westwood bags, vivienne westwood wallets and vivienne westwood brooches and so on vivienne westwood sale products.
good
Your blog article is very intersting and fanstic,at the same time the blog theme is unique and perfect,great job.To your success.
--------------------------
http://www.imsneakers.com
Thanks for this tool tip
I'd heard of sar many years ago but forgot about it. Thanks for the reminder that it exists and what it's good for.
--SYG
What about network
I have a DB2 / lighttpd / PHP based web server which seems to tank on performance from time to time for a few minutes at a time and sometimes longer, but the CPU, Memory and Disk activity looks like the system is close to idle. It's a beast (16 CPUs) doing nothing.
The Network team swear it's not a network problem but the server admins have tried hard to identify a problem but can't find anything.
I'd like a way to collect statistics on network contention or number of connections or something like that? It's a SUSE box.
Linux version 2.6.32.49-0.3-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP 2011-12-02 11:28:04 +0100
Munin
I think munin could do the trick it collects and graphs lot of parameters, from memory and cpu usage to network, interrupts and I/O.
http://munin-monitoring.org/
What about analysis
What about analysis, for instance comparison of stats for one day or one hour or one 10 minute data point against a rolling average so that tool will identify peaks without admin having to read through them?
KSar
Nice article - KSar http://sourceforge.net/projects/ksar/ is a top tool for graphing Sar data - much easier to spot trends etc.