LJ March 2010 - Troubleshooting Part 1

Troubleshooting Part 1 - great article, looking forward to the series.

I’m a 20+ year IT veteran and a 10+ year linux user but I’m still trying to get my head around this ‘Load Average’ concept.

Currently contracting to a large Australian telco on capacity planning their media assets and have been looking into load average more closely. Between us we might be able to get to the bottom of it. (I’m aware you have limited space to explain a topic some could write a thesis on but here goes anyway…)

Re your statement;
‘If I have a load of 1, the cpu is busy enough that 1 process is having to wait for cpu time’.

I have been using the philosophy that load average corresponds to the number of cpu’s, so on a single cpu system a load average of 1 equals a system running at 100%. It would need to exceed a load average of 1 in order to have waiting cpu processes. I also don’t believe the load average corresponds directly to the number of waiting processes – a load average of 2 does not mean 2 waiting processes.

The other issue comes from virtualisation technologies. One of the drivers here is to increase utilisation of the large number of physical servers that have low cpu utilisation. So does a load average of 1 (or 4 for a quad cpu system) mean high load or optimal performance? Is it good to run a system at 100% cpu for extended periods? My experience says no, but virtualisation goals say yes.

Comments welcome.

Nice point about the Linux File Cache, lost count of the number of times I’ve had to explain this one.