Performance Monitoring Tools for Linux
I now had the data, but since columns of figures are boring, I needed a way to look at the data and make sense of it. I had used gnuplot for similar tools on other systems, so it seemed to be a good choice. I started with a script to display CPU utilization, charting the percentages of time spent in idle, user, system and nice states.
The cpu data file has five columns that look like this:
0000 4690259 69915 661038 7937582 0005 4690408 69964 661286 7966975
Column 1: seconds in idle state since last bootedColumn 2: seconds in system state since last bootedColumn 3: seconds in nice state since last bootedColumn 4: seconds in user state since last bootedColumn 5: time-stamp of observation (HHMM)
My reporting scheme was to get the amount of seconds spent in each state since the last observation, add up the different states and express each one as a percentage of the total. I ran into an interesting issue right away—what about a reboot? Booting the system zeroes out the counters and subtracting the old from the new generates negative values, so I had to handle it properly to provide useful information. I decided to watch for a counter value that was lower than the last observation's value and, if found, reset the prior values to zero. To make the chart more informative, a data point was set to 100 for a reboot and -1 for a normal record. The -1 value causes the data point to be outside the chart and thus not displayed.
Sometimes a hard copy is preferred when presentations or reports are needed. The gnuplot authors provide for a variety of output formats, and the script will switch between X11 display and PostScript output depending upon which option switches are set.
Figure 1 is a sample chart produced by the graphing script shown in Listing 2. A breakdown of the major parts of this script is included in the archive file on SSC's FTP site, ftp.linuxjournal.com/pub/lj/listings/issue56/2396.tgz. Also included are the collection script, graphing scripts, a sample crontab entry for running the collector script and the following charting scripts:
cpu: charting cpu information as described above
ctxt: charting context switching per second
disk: disk utilization: total I/O, read/writes and block read/writes per second
eth: Ethernet packets sent and received per second and both incoming and outgoing errors
intr: interrupts by interrupt number and charted per second
mem: memory utilization and buffer/cache/shared memory allocations
page: page in and out activity
ppp: Point-to-Point Protocol packets sent/received per second and errors
proc: new process creation per second
swap: swap activity and swap space availability
I'm currently converting this toolkit to Perl and building a web interface to allow these charts to be viewed as HTML pages with the charts as GIF files.
All listings referred to in this article are available by anonymous download in the file ftp.linuxjournal.com/pub/lj/listings/issue56/2396.tgz.