Performance Monitoring Tools for Linux
For the last few years, I have been supporting users on various flavors of UNIX systems and have found the System Accounting Reports data invaluable for performance analysis. When I began using Linux for my personal workstation, the lack of a similar performance data collection and reporting tool set was a real problem. It's hard to get management to upgrade your system when you have no data to back up your claims of “I need more POWER!”. Thus, I started looking for a package to get the information I needed, and found out there wasn't any. I fell back on the last resort—I wrote my own, using as many existing tools as possible. I came up with scripts that collect data and display it graphically in an X11 window or hard copy.
To get a good idea of how a system is performing, watch key system resources over a period of time to see how their usage and availability changes depending upon what's running on the system. The following categories of system resources are ones I wished to track.
CPU Utilization: The central processing unit, as viewed from Linux, is always in one of the following states:
idle: available for work, waiting
user: high-level functions, data movement, math, etc.
system: performing kernel functions, I/O and other hardware interaction
nice: like user, a job with low priority will yield the CPU to another task with a higher priority
By noting the percentage of time spent in each state, we can discover overloading of one state or another. Too much idle means nothing is being done; too much system time indicates a need for faster I/O or additional devices to spread the load. Each system will have its own profile when running its workload, and by watching these numbers over time, we can determine what's normal for that system. Once a baseline is established, we can easily detect changes in the profile.
Interrupts: Most I/O devices use interrupts to signal the CPU when there is work for it to do. For example, SCSI controllers will raise an interrupt to signal that a requested disk block has been read and is available in memory. A serial port with a mouse on it will generate an interrupt each time a button is pressed/released or when the mouse is moved. Watching the count of each interrupt can give you a rough idea of how much load the associated device is handling.
Context Switching: Time slicing is the term often used to describe how computers can appear to be doing multiple jobs at once. Each task is given control of the system for a certain “slice” of time, and when that time is up, the system saves the state of the running process and gives control of the system to another process, making sure that the necessary resources are available. This administrative process is called context switching. In some operating systems, the cost of this switching can be fairly expensive, sometimes using more resources than the processes it is switching. Linux is very good in this respect, but by watching the amount of this activity, you will learn to recognize when a system has a lot of tasks actively consuming resources.
Memory: When many processes are running and using up available memory, the system will slow down as processes get paged or swapped out to make room for other processes to run. When the time slice is exhausted, that task may have to be written out to the paging device to make way for the next process. Memory-utilization graphs help point out memory problems.
Paging: As mentioned above, when available memory begins to get scarce, the virtual memory system will start writing pages of real memory out to the swap device, freeing up space for active processes. Disk drives are fast, but when paging gets beyond a certain point, the system can spend all of its time shuttling pages in and out. Paging on a Linux system can also be increased by the loading of programs, as Linux “demand pages” each portion of an executable as needed.
Swapping: Swapping is much like paging. However, it migrates entire process images, consisting of many pages of memory, from real memory to the swap devices rather than the usual page-by-page mechanism normally used for paging.
Disk I/O: Linux keeps statistics on the first four disks; total I/O, reads, writes, block reads and block writes. These numbers can show uneven loading of multiple disks and show the balance of reads versus writes.
Network I/O: Network I/O can be used to diagnose problems and examine loading of the network interface(s). The statistics show traffic in and out, collisions, and errors encountered in both directions.
These charts can also help in the following instances:
The system is running jobs you aren't aware of during hours when you are not present.
Someone is logging on or remotely running commands on the system without your knowledge.
This sort of information will often show up as a spike in the charts at times when the system should have been idle. Sudden increases in activity can also be due to jobs run by crontab.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- Trying to Tame the Tablet
- New Products
- What's the tweeting protocol?
- Validate an E-Mail Address with PHP, the Right Way
- Drupal is an Awesome CMS and a Crappy development framework
2 hours 7 min ago - IT industry leaders
4 hours 29 min ago - Reply to comment | Linux Journal
21 hours 18 min ago - Reply to comment | Linux Journal
23 hours 50 min ago - Reply to comment | Linux Journal
1 day 1 hour ago - great post
1 day 1 hour ago - Google Docs
1 day 2 hours ago - Reply to comment | Linux Journal
1 day 6 hours ago - Reply to comment | Linux Journal
1 day 7 hours ago - Web Hosting IQ
1 day 9 hours ago
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
Server Management
When you want to do network monitoring you need a network monitoring system also known as network monitoring software or a network monitoring tool. If you are looking then try SysOrb for free. http://www.evalesco.com/
Application stats also shed light
As well as Linux performance monitoring it's also useful to monitor the stuff the server is doing - whether that be Mysql, apache, tomcat, memcached, or what have you.
Having a tool that lets you monitor all this stuff in one place is a huge time saver for correlating issues and resolving performance impacts.
Time for an update
There's been some progress in the last 12 years or so...for example, Zoom from RotateRight ( http://www.rotateright.com ) provides a rich GUI or CLI-based system-wide profiler for Linux. It takes callstacks with every sample and can show source and assembly code for any sampled function.
Don't forget to use collectl
Even though this is a pretty old article it seemed that there should be a reference to collectl for completeness. http://collectl.sourceforge.net/
-mark
Web Interface
Hi Mr. Gavin,
Did you get a chance to complete the Perl based web interface for your scripts. If so, I will be very interested to get the sources...
BR,
Bart
Re: Performance Monitoring Tools for Linux
The sarChart.cgi script has a bug in it. It reads from the tstamp column in each table incorrectly. To calculate the time it uses substr to extract the hour and min, but the offset parameter is off by 2 in both cases. This problem is probably due to changing the length of the year from 2 to 4 digits.
Re: Performance Monitoring Tools for Linux
Bull *****..There is no bug in it..
Re: Performance Monitoring Tools for Linux
To use these utilities on a multi-cpu machine change line 40 of the sa scrip fromt:
40 /^cpu/ {
to:
/^cpu / {
Note: add a space between the "/" and the "u" in cpu.
This change won't give you information on each individual cpu - but will use the aggregates as reported in the proc pseudo file system.
Re: Performance Monitoring Tools for Linux
Description of the columns in the CPU output is incorrect:
0000 4690259 69915 661038 7937582
Column 5: seconds in idle state since last booted
Column 2: seconds in system state since last booted
Column 3: seconds in nice state since last booted
Column 4: seconds in user state since last booted
Column 1: time-stamp of observation (HHMM)
call me picky.
picky too
Call me picky but the unit of measure is 1/100 of a second
picky
picky
Wow... 2+ years and you
Wow... 2+ years and you decide to respond with "picky"..?
How about "you're right -- good catch". Or better yet, don't respond.
Am I contradicting myself by responding to you? No. You took a perfectly acceptable observation and decided to respond with an opinion. I'm taking your opinion and responding with an observation.
Query regarding running the above scripts
Hello,
Iam Surender, Iam a naive user. I have downloaded the above scripts for cpu utilisation, disk usage etc but I dont know how to execute the same. Somebody please help me out in this regard.
My email address: surenuder@gmail.com
Thanks,
Surender