Performance Monitoring Tools for Linux
The file /proc/stat contains current counters for most of the data I wanted, and it is in a readable format. In order to keep the collector script as quick and simple as possible, I saved the data in a readable format rather than as binary data.
Breaking down and reorganizing the data for storage was a good job for awk, writing the data out to different files depending on the type of data. The /proc files are formatted nicely for this; each record has an identifying name in the first field. Here's a sample of /proc/stat from my 486 system:
cpu 1228835 394 629667 23922418 disk 43056 111530 0 0 disk_rio 18701 20505 0 0 disk_wio 24355 91025 0 0 disk_rblk 37408 40690 0 0 disk_wblk 48710 182050 0 0 page 94533 204827 swap 1 0 intr 27433973 25781314 58961 0 1059544 368102 1 2\ 0 0 0 11133 154916 0 0 0 0 ctxt 18176677 btime 863065361 processes 18180
I dug into the kernel source for the /proc file system to figure out what the various fields were, as the man pages seem to date back to 1.x.
cpu: contains the following information: jiffies (1/100 of a second) spent in user/nice/system/idle states. I wasn't too concerned about the actual measurement, as I was just planning on looking at each state as a percentage of the total.
disk: summarizes all I/O to each of the four disks, while disk_rio, disk_wio, disk_rblk and disk_wblk break down the total into read, write, blocks read and blocks written.
page: page in and out counters
swap: counts of pages swapped in and out. The swap data in /proc/meminfo is expressed as total pages, used and free. Combine both sets of data to get a clear picture of swap activity.
intr: total interrupts since boot time, followed by counts for each interrupt.
ctxt: the number of context switches since boot time. This counts the number of times one process was “put to sleep” and another was “awakened”.
btime: I haven't found much use for this—it is the number of seconds after January 1, 1970 that the system was booted.
processes: the most recent process identification number. This is a good way to see how many processes have been spawned since the last check, so by subtracting the old value from the current one and dividing by the time difference (in seconds) between the two observations, the number of new processes per second is known and can be used to measure how busy the system is.
The lines we want here are the ethx and pppx records. In the collector script, the data is written out to a file using the full interface name. This way, the script is generalized for most any configuration.
Memory utilization can be tracked in the /proc/meminfo file as shown in Table 2.
The memory counters are expressed twice in this file, so we need to save only the Mem: and Swap: records to get the whole picture. The script matches the keywords at the start of the line and writes the data out to individual files rather than to one large database to allow more flexibility as new fields or data types are added. This makes for a cluttered directory but simpler script writing.
The script that collects the data is shown in Listing 1. Here are some things that are going on in a few key parts, plus comments:
Line 13: move to the directory where the data is to be stored using cd.
Line 14: get the timestamp for the data records in format HHMM.
Line 15: get the date for the output data file names in format MonDD.YY
Lines 19 - 25: select the memory and swap counter lines from /proc/meminfo and write the timestamp and data portion of the record to Mem.MonDD.YY and Swap.MonDD.YY.
Lines 29 - 36: extract the counters for any network interfaces from /proc/net/dev and write them out to files including the interface numbers, i.e., eth0 data is written out to eth0.MonDD.YY.
Lines 39 - 79: clip counters for cpu, disk, paging, swap page usage, interrupts, context switching and process numbers from /proc/stat and write them out to appropriate files.
The following line in my crontab file runs the collection script every five minutes every hour of every day:
0,5,10,15,20,25,30,35,40,45,50,55 * * * *\
/var/log/sar/sa 0 0 * * * exec /usr/bin/find\
/var/log/sar/data/ -mtime +14
-exec /bin/rm -f {} \;
The data accumulates over the course of the day to provide the data
points for analysis. A cleanup script invoked by the second line
removes each file after two weeks to keep the disk space
requirements down. A possible enhancement might be to compress each
file after it is complete, but space hasn't been much of an issue
yet.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- RSS Feeds
- It is quiet helping
49 min 31 sec ago - Technology
1 hour 6 min ago - Reachli - Amplifying your
2 hours 22 min ago - excellent
3 hours 11 min ago - good point!
3 hours 14 min ago - Varnish works!
3 hours 23 min ago - Reply to comment | Linux Journal
3 hours 53 min ago - Reply to comment | Linux Journal
6 hours 19 min ago - Reply to comment | Linux Journal
10 hours 19 min ago - Yeah, user namespaces are
11 hours 35 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Server Management
When you want to do network monitoring you need a network monitoring system also known as network monitoring software or a network monitoring tool. If you are looking then try SysOrb for free. http://www.evalesco.com/
Application stats also shed light
As well as Linux performance monitoring it's also useful to monitor the stuff the server is doing - whether that be Mysql, apache, tomcat, memcached, or what have you.
Having a tool that lets you monitor all this stuff in one place is a huge time saver for correlating issues and resolving performance impacts.
Time for an update
There's been some progress in the last 12 years or so...for example, Zoom from RotateRight ( http://www.rotateright.com ) provides a rich GUI or CLI-based system-wide profiler for Linux. It takes callstacks with every sample and can show source and assembly code for any sampled function.
Don't forget to use collectl
Even though this is a pretty old article it seemed that there should be a reference to collectl for completeness. http://collectl.sourceforge.net/
-mark
Web Interface
Hi Mr. Gavin,
Did you get a chance to complete the Perl based web interface for your scripts. If so, I will be very interested to get the sources...
BR,
Bart
Re: Performance Monitoring Tools for Linux
The sarChart.cgi script has a bug in it. It reads from the tstamp column in each table incorrectly. To calculate the time it uses substr to extract the hour and min, but the offset parameter is off by 2 in both cases. This problem is probably due to changing the length of the year from 2 to 4 digits.
Re: Performance Monitoring Tools for Linux
Bull *****..There is no bug in it..
Re: Performance Monitoring Tools for Linux
To use these utilities on a multi-cpu machine change line 40 of the sa scrip fromt:
40 /^cpu/ {
to:
/^cpu / {
Note: add a space between the "/" and the "u" in cpu.
This change won't give you information on each individual cpu - but will use the aggregates as reported in the proc pseudo file system.
Re: Performance Monitoring Tools for Linux
Description of the columns in the CPU output is incorrect:
0000 4690259 69915 661038 7937582
Column 5: seconds in idle state since last booted
Column 2: seconds in system state since last booted
Column 3: seconds in nice state since last booted
Column 4: seconds in user state since last booted
Column 1: time-stamp of observation (HHMM)
call me picky.
picky too
Call me picky but the unit of measure is 1/100 of a second
picky
picky
Wow... 2+ years and you
Wow... 2+ years and you decide to respond with "picky"..?
How about "you're right -- good catch". Or better yet, don't respond.
Am I contradicting myself by responding to you? No. You took a perfectly acceptable observation and decided to respond with an opinion. I'm taking your opinion and responding with an observation.
Query regarding running the above scripts
Hello,
Iam Surender, Iam a naive user. I have downloaded the above scripts for cpu utilisation, disk usage etc but I dont know how to execute the same. Somebody please help me out in this regard.
My email address: surenuder@gmail.com
Thanks,
Surender