Examining Load Average

Understanding work-load averages as opposed to CPU usage.
The Wizard behind the Curtain

The load-average calculation is best thought of as a moving average of processes in Linux's run queue marked running or uninterruptible. The words “thought of” were chosen for a reason: that is how the measurements are meant to be interpreted, but not exactly what happens behind the curtain. It is at this juncture in our journey when the reality of it all, like quantum mechanics, seems not to fit the intuitive way as it presents itself.

The load averages that the top and uptime commands display are obtained directly from /proc. If you are running Linux kernel 2.4 or later, you can read those values yourself with the command cat /proc/loadavg. However, it is the Linux kernel that produces those values in /proc. Specifically, timer.c and sched.h work together to do the computation. To understand what timer.c does for a living, the concept of time slicing and the jiffy counter help round out the picture.

In the Linux kernel, each dispatchable process is given a fixed amount of time on the CPU per dispatch. By default, this amount is 10 milliseconds, or 1/100th of a second. For that short time span, the process is assigned a physical CPU on which to run its instructions and allowed to take over that processor. More often than not, the process will give up control before the 10ms are up through socket calls, I/O calls or calls back to the kernel. (On an Intel 2.6GHz processor, 10ms is enough time for approximately 50-million instructions to occur. That's more than enough processing time for most application cycles.) If the process uses its fully allotted CPU time of 10ms, an interrupt is raised by the hardware, and the kernel regains control from the process. The kernel then promptly penalizes the process for being such a hog. As you can see, that time slicing is an important design concept for making your system seem to run smoothly on the outside. It also is the vehicle that produces the load-average values.

The 10ms time slice is an important enough concept to warrant a name for itself: quantum value. There is not necessarily anything inherently special about 10ms, but there is about the quantum value in general, because whatever value it is set to (it is configurable, but 10ms is the default), it controls how often at a minimum the kernel takes control of the system back from the applications. One of the many chores the kernel performs when it takes back control is to increment its jiffies counter. The jiffies counter measures the number of quantum ticks that have occurred since the system was booted. When the quantum timer pops, timer.c is entered at a function in the kernel called timer.c:do_timer(). Here, all interrupts are disabled so the code is not working with moving targets. The jiffies counter is incremented by 1, and the load-average calculation is checked to see if it should be computed. In actuality, the load-average computation is not truly calculated on each quantum tick, but driven by a variable value that is based on the HZ frequency setting and tested on each quantum tick. (HZ is not to be confused with the processor's MHz rating. This variable sets the pulse rate of particular Linux kernel activity and 1HZ equals one quantum or 10ms by default.) Although the HZ value can be configured in some versions of the kernel, it is normally set to 100. The calculation code uses the HZ value to determine the calculation frequency. Specifically, the timer.c:calc_load() function will run the averaging algorithm every 5 * HZ, or roughly every five seconds. Following is that function in its entirety:


   unsigned long avenrun[3];

   static inline void calc_load(unsigned long ticks)
   {
      unsigned long active_tasks; /* fixed-point */
      static int count = LOAD_FREQ;

      count -= ticks;
      if (count < 0) {
         count += LOAD_FREQ;
         active_tasks = count_active_tasks();
         CALC_LOAD(avenrun[0], EXP_1, active_tasks);
         CALC_LOAD(avenrun[1], EXP_5, active_tasks);
         CALC_LOAD(avenrun[2], EXP_15, active_tasks);
      }
   }

The avenrun array contains the three averages we have been discussing. The calc_load() function is called by update_times(), also found in timer.c, and is the code responsible for supplying the calc_load() function with the ticks parameter. Unfortunately, this function does not reveal its most interesting aspect: the computation itself. However, that can be located easily in sched.h, a header used by much of the kernel code. In there, the CALC_LOAD macro and its associated values are available:


   extern unsigned long avenrun[];	/* Load averages */

   #define FSHIFT   11		/* nr of bits of precision */
   #define FIXED_1  (1<<FSHIFT)	/* 1.0 as fixed-point */
   #define LOAD_FREQ (5*HZ)	/* 5 sec intervals */
   #define EXP_1  1884		/* 1/exp(5sec/1min) as fixed-point */
   #define EXP_5  2014		/* 1/exp(5sec/5min) */
   #define EXP_15 2037		/* 1/exp(5sec/15min) */

   #define CALC_LOAD(load,exp,n) \
      load *= exp; \
      load += n*(FIXED_1-exp); \
      load >>= FSHIFT;

Here is where the tires meet the pavement. It should now be evident that reality does not appear to match the illusion. At least, this is certainly not the type of averaging most of us are taught in grade school. But it is an average nonetheless. Technically, it is an exponential decay function and is the moving average of choice for most UNIX systems as well as Linux. Let's examine its details.

The macro takes in three parameters: the load-average bucket (one of the three elements in avenrun[]), a constant exponent and the number of running/uninterruptible processes currently on the run queue. The possible exponent constants are listed above: EXP_1 for the 1-minute average, EXP_5 for the 5-minute average and EXP_15 for the 15-minute average. The important point to notice is that the value decreases with age. The constants are magic numbers that are calculated by the mathematical function shown below:

When x=1, then y=1884; when x=5, then y=2014; and when x=15, then y=2037. The purpose of the magical numbers is that it allows the CALC_LOAD macro to use precision fixed-point representation of fractions. The magic numbers are then nothing more than multipliers used against the running load average to make it a moving average. (The mathematics of fixed-point representation are beyond the scope of this article, so I will not attempt an explanation.) The purpose of the exponential decay function is that it not only smooths the dips and spikes by maintaining a useful trend line, but it accurately decreases the quality of what it measures as activity ages. As time moves forward, successive CPU events increase their significance on the load average. This is what we want, because more recent CPU activity probably has more of an impact on the current state than ancient events. In the end, the load averages give a smooth trend from 15 minutes through the current minute and give us a window into not only the CPU usage but also the average demand for the CPUs. As the load average goes above the number of physical CPUs, the more the CPU is being used and the more demand there is for it. And, as it recedes, the less of a demand there is. With this understanding, the load average can be used with the CPU percentage to obtain a more accurate view of CPU activity.

It is my hope that this serves not only as a practical interpretation of Linux's load averages but also illuminates some of the dark mathematical shadows behind them. For more information, a study of the exponential decay function and its applications would shed more light on the subject. But for the more practical-minded, plotting the load average vs. a controlled number of processes (that is, modeling the effects of the CALC_LOAD algorithm in a controlled loop) would give you a feel for the actual relationship and how the decaying filter applies.

Ray Walker is a consultant specializing in UNIX kernel-level code. He has been a software developer for more than 25 years, working with Linux since 1995. He can be contacted at ray.rwalk2730@gmail.com.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

load ave

DD's picture

I have been trying to figure out what those 3 numbers meant. Thanks for the article. How does hyperthreading play into the number of lanes? Is that additive or do I just count the total # of cores?

Thanks for the article

Anonymous's picture

It is really good article, till now following load avg values blindly.
This article given clear understanding.

Thank you for the article,

Andy23's picture

Thank you for the article, now i've understood the difference.
But what about hyperthreading?

VPS environment

Anonymous's picture

I'm curious, how does top report load average on a VPS? is it reporting on the VPS's average or the server machine's?

Just thought I put it out there.. thanks for any replies.

great article

Anonymous's picture

this certainly answered a lot of questions for me. great article. thanks!

How to get CPU load inf in terms of Percentage

Vikram's picture

CPU load average, we can get using many commands, but how to get this average value in terms of percentage

Is there any formula for it? searched in google couldnt get right information on how to get this percentile value

Thanks
Vikram

multi core procs

ubergoober's picture

So how does this apply to multi core cpu's?
If I have 4 quad core cpu's in my system then do I still only have 4 lanes on my highway? or do I have 16?

Thanks,
Dave

You have one lane per core in

Anonymous's picture

You have one lane per core in the system.
So a system with 4 CPUs, each with one core would be a 4-lane highway (as stated in the article) - or you can think of it as 4 single lane highways.
If you have a single quad-core then you have a 4-lane highway.

If you have 4 quad-core CPUs, then you have 4 4-land highways (or a single 16-lane highway).

Nice & useful article ! I'd

Anonymous's picture

Nice & useful article !

I'd like to mention here that the timeslices of each task are NOT 10 ms, but 100 ms as an average (nice val 0) & goes down to 10 ms (assuming HZ=100) for nice val -19.
This is ofcourse for the pre-CFS days, which is when I believe the article was written

Load Average

Deven Patankar's picture

Always had a confusion about load avg, thanks to your article it is clear now. Great Article

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState