The Sysadmin's Toolbox: sar

As someone who's been working as a system administrator for a number of years, it's easy to take tools for granted that I've used for a long time and assume everyone has heard of them. Of course, new sysadmins get into the field every day, and even seasoned sysadmins don't all use the same tools. With that in mind, I decided to write a few columns where I highlight some common-but-easy-to-overlook tools that make life as a sysadmin (and really, any Linux user) easier. I start the series with a classic troubleshooting tool: sar.

There's an old saying: "When the cat's away the mice will play." The same is true for servers. It's as if servers wait until you aren't logged in (and usually in the middle of REM sleep) before they have problems. Logs can go a long way to help you isolate problems that happened in the past on a machine, but if the problem is due to high load, logs often don't tell the full story. In my March 2010 column "Linux Troubleshooting, Part I: High Load" (http://www.linuxjournal.com/article/10688), I discussed how to troubleshoot a system with high load using tools such as uptime and top. Those tools are great as long as the system still has high load when you are logged in, but if the system had high load while you were at lunch or asleep, you need some way to pull the same statistics top gives you, only from the past. That is where sar comes in.

Enable sar Logging

sar is a classic Linux tool that is part of the sysstat package and should be available in just about any major distribution with your regular package manager. Once installed, it will be enabled on a Red Hat-based system, but on a Debian-based system (like Ubuntu), you might have to edit /etc/default/sysstat, and make sure that ENABLED is set to true. On a Red Hat-based system, sar will log seven days of statistics by default. If you want to log more than that, you can edit /etc/sysconfig/sysstat and change the HISTORY option.

Once sysstat is configured and enabled, it will collect statistics about your system every ten minutes and store them in a logfile under either /var/log/sysstat or /var/log/sa via a cron job in /etc/cron.d/sysstat. There is also a daily cron job that will run right before midnight and rotate out the day's statistics. By default, the logfiles will be date-stamped with the current day of the month, so the logs will rotate automatically and overwrite the log from a month ago.

CPU Statistics

After your system has had some time to collect statistics, you can use the sar tool to retrieve them. When run with no other arguments, sar displays the current day's CPU statistics:


$ sar
. . .
07:05:01 PM  CPU  %user  %nice  %system  %iowait %steal  %idle
. . .
08:45:01 PM  all   4.62   0.00     1.82     0.44   0.00   93.12
08:55:01 PM  all   3.80   0.00     1.74     0.47   0.00   93.99
09:05:01 PM  all   5.85   0.00     2.01     0.66   0.00   91.48
09:15:01 PM  all   3.64   0.00     1.75     0.35   0.00   94.26
Average:     all   7.82   0.00     1.82     1.14   0.00   89.21

If you are familiar with the command-line tool top, the above CPU statistics should look familiar, as they are the same as you would get in real time from top. You can use these statistics just like you would with top, only in this case, you are able to see the state of the system back in time, along with an overall average at the bottom of the statistics, so you can get a sense of what is normal. Because I devoted an entire previous column to using these statistics to troubleshoot high load, I won't rehash all of that here, but essentially, sar provides you with all of the same statistics, just at ten-minute intervals in the past.

RAM Statistics

sar also supports a large number of different options you can use to pull out other statistics. For instance, with the -r option, you can see RAM statistics:


$ sar -r
. . .
07:05:01 PM kbmemfree kbmemused %memused kbbuffers  kbcached  
kbcommit  %commit
. . .
08:45:01 PM    881280   2652840     75.06    355284   1028636   
8336664    183.87
08:55:01 PM    881412   2652708     75.06    355872   1029024   
8337908    183.89
09:05:01 PM    879164   2654956     75.12    356480   1029428   
8337040    183.87
09:15:01 PM    886724   2647396     74.91    356960   1029592   
8332344    183.77
Average:       851787   2682333     75.90    338612   1081838   
8341742    183.98

Just like with the CPU statistics, here I can see RAM statistics from the past similar to what I could find in top.

Disk Statistics

Back in my load troubleshooting column, I referenced sysstat as the source for a great disk I/O troubleshooting tool called iostat. Although that provides real-time disk I/O statistics, you also can pass sar the -b option to get disk I/O data from the past:


$ sar -b
. . .
07:05:01 PM    tps    rtps    wtps   bread/s   bwrtn/s
. . .
08:45:01 PM   2.03    0.33    1.70      9.90     31.30
08:55:01 PM   1.93    0.03    1.90      1.04     31.95
09:05:01 PM   2.71    0.02    2.69      0.69     48.67
09:15:01 PM   1.52    0.02    1.50      0.20     27.08
Average:      5.92    3.42    2.50     77.41     49.97

I figure these columns need a little explanation:

  • tps: transactions per second.

  • rtps: read transactions per second.

  • wtps: write transactions per second.

  • bread/s: blocks read per second.

  • bwrtn/s: blocks written per second.

sar can return a lot of other statistics beyond what I've mentioned, but if you want to see everything it has to offer, simply pass the -A option, which will return a complete dump of all the statistics it has for the day (or just browse its man page).

Turn Back Time

So by default, sar returns statistics for the current day, but often you'll want to get information a few days in the past. This is especially useful if you want to see whether today's numbers are normal by comparing them to days in the past, or if you are troubleshooting a server that misbehaved over the weekend. For instance, say you noticed a problem on a server today between 5PM and 5:30PM. First, use the -s and -e options to tell sar to display data only between the start (-s) and end (-e) times you specify:


$ sar -s 17:00:00 -e 17:30:00
Linux 2.6.32-29-server (www.example.net)  02/06/2012   _x86_64_
(2 CPU)

05:05:01 PM  CPU  %user  %nice %system %iowait  %steal  %idle
05:15:01 PM  all   4.39   0.00    1.83    0.39    0.00   93.39
05:25:01 PM  all   5.76   0.00    2.23    0.41    0.00   91.60
Average:     all   5.08   0.00    2.03    0.40    0.00   92.50

To compare that data with the same time period from a different day, just use the -f option and point sar to one of the logfiles under /var/log/sysstat or /var/log/sa that correspond to that day. For instance, to pull statistics from the first of the month:


$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 
Linux 2.6.32-29-server (www.example.net)  02/01/2012   _x86_64_
(2 CPU)

05:05:01 PM  CPU  %user  %nice  %system  %iowait %steal  %idle
05:15:01 PM  all   9.85   0.00     3.95     0.56   0.00   85.64
05:25:01 PM  all   5.32   0.00     1.81     0.44   0.00   92.43
Average:     all   7.59   0.00     2.88     0.50   0.00   89.04

You also can add all of the normal sar options when pulling from past logfiles, so you could run the same command and add the -r argument to get RAM statistics:


$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 -r
Linux 2.6.32-29-server (www.example.net)  02/01/2012   _x86_64_
(2 CPU)

05:05:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  
kbcommit  %commit
05:15:01 PM    766452   2767668     78.31    361964   1117696   
8343936    184.03
05:25:01 PM    813744   2720376     76.97    362524   1118808   
8329568    183.71
Average:       790098   2744022     77.64    362244   1118252   
8336752    183.87

As you can see, sar is a relatively simple but very useful troubleshooting tool. Although plenty of other programs exist that can pull trending data from your servers and graph them (and I use them myself), sar is great in that it doesn't require a network connection, so if your server gets so heavily loaded it doesn't respond over the network anymore, there's still a chance you could get valuable troubleshooting data with sar.

Toolbox image via Shutterstock.com.

______________________

Kyle Rankin is a systems architect; and the author of DevOps Troubleshooting, The Official Ubuntu Server Book, Knoppix Hacks, Knoppix Pocket Reference, Linux Multimedia Hacks, and Ubuntu Hacks.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I am currently developing a

SmithAbraham's picture

I am currently developing a project so this will definitely gonna help me out.

stair treads carpet

http://www.naturalhomerugs.com/

Great tool

KonichiwaCoder's picture

I am a developer, passionate about Linux/Unix. This tool will help me a lot. Thanks.

Currently at the work's server, CentOS, I get: permission denied on sar. So I have to wait until I get home to play with it on Ubuntu.

Thanks

Now like vivienne westwood

viviennewestwood's picture

Now like vivienne westwood jewellery,you can consider the vivienne westwood earrings, vivienne westwood necklace and vivienne westwood bracelet. Vivienne westwood accessories also includes vivienne westwood bags, vivienne westwood wallets and vivienne westwood brooches and so on vivienne westwood sale products.

Thanks for this tool tip

Sum Yung Gai's picture

I'd heard of sar many years ago but forgot about it. Thanks for the reminder that it exists and what it's good for.

--SYG

What about network

AnilG's picture

I have a DB2 / lighttpd / PHP based web server which seems to tank on performance from time to time for a few minutes at a time and sometimes longer, but the CPU, Memory and Disk activity looks like the system is close to idle. It's a beast (16 CPUs) doing nothing.

The Network team swear it's not a network problem but the server admins have tried hard to identify a problem but can't find anything.

I'd like a way to collect statistics on network contention or number of connections or something like that? It's a SUSE box.

Linux version 2.6.32.49-0.3-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP 2011-12-02 11:28:04 +0100

Munin

vicm3's picture

I think munin could do the trick it collects and graphs lot of parameters, from memory and cpu usage to network, interrupts and I/O.

http://munin-monitoring.org/

What about analysis

AnilG's picture

What about analysis, for instance comparison of stats for one day or one hour or one 10 minute data point against a rolling average so that tool will identify peaks without admin having to read through them?

KSar

JonB's picture

Nice article - KSar http://sourceforge.net/projects/ksar/ is a top tool for graphing Sar data - much easier to spot trends etc.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState