LJ March 2010 - Troubleshooting Part 1

Troubleshooting Part 1 - great article, looking forward to the series.

I’m a 20+ year IT veteran and a 10+ year linux user but I’m still trying to get my head around this ‘Load Average’ concept.

Currently contracting to a large Australian telco on capacity planning their media assets and have been looking into load average more closely. Between us we might be able to get to the bottom of it. (I’m aware you have limited space to explain a topic some could write a thesis on but here goes anyway…)

Re your statement;
‘If I have a load of 1, the cpu is busy enough that 1 process is having to wait for cpu time’.

I have been using the philosophy that load average corresponds to the number of cpu’s, so on a single cpu system a load average of 1 equals a system running at 100%. It would need to exceed a load average of 1 in order to have waiting cpu processes. I also don’t believe the load average corresponds directly to the number of waiting processes – a load average of 2 does not mean 2 waiting processes.

The other issue comes from virtualisation technologies. One of the drivers here is to increase utilisation of the large number of physical servers that have low cpu utilisation. So does a load average of 1 (or 4 for a quad cpu system) mean high load or optimal performance? Is it good to run a system at 100% cpu for extended periods? My experience says no, but virtualisation goals say yes.

Comments welcome.

Nice point about the Linux File Cache, lost count of the number of times I’ve had to explain this one.




One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix