Performance Monitoring Tools for Linux

Mr. Gavin provides tools for systems data collection and display and discusses what information is needed and why.
What Do We Do with the Data?

I now had the data, but since columns of figures are boring, I needed a way to look at the data and make sense of it. I had used gnuplot for similar tools on other systems, so it seemed to be a good choice. I started with a script to display CPU utilization, charting the percentages of time spent in idle, user, system and nice states.

The cpu data file has five columns that look like this:

0000 4690259 69915 661038 7937582
0005 4690408 69964 661286 7966975

Column 1: seconds in idle state since last bootedColumn 2: seconds in system state since last bootedColumn 3: seconds in nice state since last bootedColumn 4: seconds in user state since last bootedColumn 5: time-stamp of observation (HHMM)

My reporting scheme was to get the amount of seconds spent in each state since the last observation, add up the different states and express each one as a percentage of the total. I ran into an interesting issue right away—what about a reboot? Booting the system zeroes out the counters and subtracting the old from the new generates negative values, so I had to handle it properly to provide useful information. I decided to watch for a counter value that was lower than the last observation's value and, if found, reset the prior values to zero. To make the chart more informative, a data point was set to 100 for a reboot and -1 for a normal record. The -1 value causes the data point to be outside the chart and thus not displayed.

Sometimes a hard copy is preferred when presentations or reports are needed. The gnuplot authors provide for a variety of output formats, and the script will switch between X11 display and PostScript output depending upon which option switches are set.

Figure 1. Sample Chart

Figure 1 is a sample chart produced by the graphing script shown in Listing 2. A breakdown of the major parts of this script is included in the archive file on SSC's FTP site, Also included are the collection script, graphing scripts, a sample crontab entry for running the collector script and the following charting scripts:

  • cpu: charting cpu information as described above

  • ctxt: charting context switching per second

  • disk: disk utilization: total I/O, read/writes and block read/writes per second

  • eth: Ethernet packets sent and received per second and both incoming and outgoing errors

  • intr: interrupts by interrupt number and charted per second

  • mem: memory utilization and buffer/cache/shared memory allocations

  • page: page in and out activity

  • ppp: Point-to-Point Protocol packets sent/received per second and errors

  • proc: new process creation per second

  • swap: swap activity and swap space availability

I'm currently converting this toolkit to Perl and building a web interface to allow these charts to be viewed as HTML pages with the charts as GIF files.

All listings referred to in this article are available by anonymous download in the file

David Gavin ( has worked in various support environments since 1977, when after COBOL training, he had the good fortune to be assigned to the TSO (Time Sharing Option) support group. From there he moved to MVS technical support, to VM and to UNIX. He has worked with UNIX from mainframes to desktops, baby-sitting Microsoft systems only when he couldn't avoid it. He started using Linux back when it meant downloading twenty-five disks over a 2400 BAUD dial-up line.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Server Management

Server Management's picture

When you want to do network monitoring you need a network monitoring system also known as network monitoring software or a network monitoring tool. If you are looking then try SysOrb for free.

Application stats also shed light

Anonymous's picture

As well as Linux performance monitoring it's also useful to monitor the stuff the server is doing - whether that be Mysql, apache, tomcat, memcached, or what have you.

Having a tool that lets you monitor all this stuff in one place is a huge time saver for correlating issues and resolving performance impacts.

Time for an update

Anonymous's picture

There's been some progress in the last 12 years or so...for example, Zoom from RotateRight ( ) provides a rich GUI or CLI-based system-wide profiler for Linux. It takes callstacks with every sample and can show source and assembly code for any sampled function.

Don't forget to use collectl

Mark Seger's picture

Even though this is a pretty old article it seemed that there should be a reference to collectl for completeness.

Web Interface

Anonymous's picture

Hi Mr. Gavin,

Did you get a chance to complete the Perl based web interface for your scripts. If so, I will be very interested to get the sources...



Re: Performance Monitoring Tools for Linux

Anonymous's picture

The sarChart.cgi script has a bug in it. It reads from the tstamp column in each table incorrectly. To calculate the time it uses substr to extract the hour and min, but the offset parameter is off by 2 in both cases. This problem is probably due to changing the length of the year from 2 to 4 digits.

Re: Performance Monitoring Tools for Linux

Anonymous's picture

Bull *****..There is no bug in it..

Re: Performance Monitoring Tools for Linux

mrlynn's picture

To use these utilities on a multi-cpu machine change line 40 of the sa scrip fromt:
40 /^cpu/ {
/^cpu / {

Note: add a space between the "/" and the "u" in cpu.

This change won't give you information on each individual cpu - but will use the aggregates as reported in the proc pseudo file system.

Re: Performance Monitoring Tools for Linux

mrlynn's picture

Description of the columns in the CPU output is incorrect:
0000 4690259 69915 661038 7937582
Column 5: seconds in idle state since last booted
Column 2: seconds in system state since last booted
Column 3: seconds in nice state since last booted
Column 4: seconds in user state since last booted
Column 1: time-stamp of observation (HHMM)

call me picky.

picky too

Anonymous's picture

Call me picky but the unit of measure is 1/100 of a second


Anonymous's picture


Wow... 2+ years and you

Anonymous's picture

Wow... 2+ years and you decide to respond with "picky"..?

How about "you're right -- good catch". Or better yet, don't respond.

Am I contradicting myself by responding to you? No. You took a perfectly acceptable observation and decided to respond with an opinion. I'm taking your opinion and responding with an observation.

Query regarding running the above scripts

Surender's picture


Iam Surender, Iam a naive user. I have downloaded the above scripts for cpu utilisation, disk usage etc but I dont know how to execute the same. Somebody please help me out in this regard.

My email address: