Hack and / - Linux Troubleshooting, Part I: High Load

What do you do when you get an alert that your system load is high? Tracking down the cause of high load just takes some time, some experience and a few Linux tools.
Once You Track Down the Culprit

How you deal with these load-causing processes is up to you and depends on a lot of factors. In some cases, you might have a script that has gone out of control and is something you can easily kill. In other situations, such as in the case of a database process, it might not be safe simply to kill the process, because it could leave corrupted data behind. Plus, it could just be that your service is running out of capacity, and the real solution is either to add more resources to your current server or add more servers to share the load. It might even be load from a one-time job that is running on the machine and shouldn't impact load in the future, so you just can let the process complete. Because so many different things can cause processes to tie up server resources, it's hard to list them all here, but hopefully, being able to identify the causes of your high load will put you on the right track the next time you get an alert that a machine is slow.

Kyle Rankin is a Systems Architect in the San Francisco Bay Area and the author of a number of books, including The Official Ubuntu Server Book, Knoppix Hacks and Ubuntu Hacks. He is currently the president of the North Bay Linux Users' Group.


Kyle Rankin is Chief Security Officer at Purism, a company focused on computers that respect your privacy, security, and freedom. He is the author of many books including Linux Hardening in Hostile Networks, DevOps Troubleshooting and The Official Ubuntu


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Load averages

Chris Sardius's picture

I mange some quadcore linux systems and basically get a spike in load averages every now and then. I looking forward to deploying a permanent fix for this . I


slashbob's picture

Well, my Ubuntu 10.04 uptime only lasted about 35 days and then there was a kernel security update so I had to reboot.

From what I've seen Debian/Ubuntu and CentOS have kernel security updates about every three or four months.

Maybe it's time for FreeBSD...

in reply to uptime question

ayam666@hotmail.com's picture

"...what distro are you using to get 365 days of uptime..."

Just to comment on the question, I used to manage a few FreeBSD boxes (as internet gateway, mail server, DNS server and Samba server) for a company that virtually had no budget for their IT department.

We did not have brand named hardware. One of the servers was even built using part I scavenged from old boxes.

Most of them were very stable with uptime of more than 365 days.

The only time we had to switch off the server was because of a planned electricity upgrade by the electricity department - a possibility of outage of more than 5 hours just after midnight. Other than that, it was one case of bad sector on one of the mirrored hard disk.

In my opinion, it might not be Linux or the OS that causes low uptime. Rather, its the applications that we run on it bring down the system most of the time.

Like Kyle, I only upgraded the OS for security reasons.



iotop on centos

Anonymous's picture

# rpm -i iotop-0.4-1.noarch.rpm
error: Failed dependencies:
python(abi) = 2.6 is needed by iotop-0.4-1.noarch

Setting up Install Process
Package python-2.4.3-27.el5.i386 already installed and latest version

1. Where can I get a iotop compatible with python-2.4
2. If not, how can I upgrade my python on Centos 5?

Bob, maybe its not the distro

Anonymous's picture

Bob, maybe its not the distro itself that's causing your problem.


slashbob's picture


Thanks for the info. I guess Debian stable has, from my two years of experience, had kernel security updates at least every 3 months.

I was waiting to see which would come out first, Ubuntu 10.04 or CentOS 5.5, and had decided that which ever was released first is what I would use (for now). Knowing that Ubuntu would be supported for 5 years and CentOS for roughly the same time frame for 5.5 or 7 years from release.

Anyway, I've installed Ubuntu and it's been running nicely for 14 days now. When I ssh into the server it has an interesting summary of system information like: System Load, Swap usage, CPU temperature, Users logged in, Usage of /home, and then it tells me how many packages can be updated and how many are security updates which I thought was kind of nice.

During install it asked if I wanted to setup unattended-upgrades for security updates, I know it can be a little scary, but this is just a home server so I agreed. So everyday it checks for security updates and if there are any it installs them and sends me an email of what was done (Thanks must go to Kyle for his great tutorials of setting up a mail server with postfix. Thanks Kyle!).

We'll see what kinds of uptimes I get now.


Jerry McBride's picture

Say Bob,

I've been running Gentoo these last few years and the only thing that shortens the uptime on my servers is when a kernel comes out with new features or security updates. That aside, I can/could break the 365 day uptime with ease on these boxes.

The only real time I had problems getting a box to run for any lenght of time, was when I finally figured out that it had some badly manufactured memory in it. Swapped the junk out for some name brand sticks and it ran as I expected it to.

With linux, any problems with operation, I would suspect hardware issues before digging aroung the OS.

Just my two cents...


---- Jerry McBride

I use a mix of distributions

Kyle Rankin's picture

I use a mix of distributions and have gotten 1-2 year uptimes on most of them. In production I'm also resistant to update something just so the version number is higher, so as long as I have reasonably stable applications and reliable power, it's not too difficult to maintain a high uptime. The main enemy of it these days is kernel upgrades, which again, I resist unless there is a good reason (security) to do so.

Kyle Rankin is Chief Security Officer at Purism, a company focused on computers that respect your privacy, security, and freedom. He is the author of
many books including Linux Hardening in Hostile Networks, DevOps Troubleshooting and The Official Ubuntu


slashbob's picture

So, the age old question ...

Kyle, what distro are you using to get 365 days of uptime?

I've been running Debian for a couple of years and never seen anything greater than 77 days. Thinking of switching my server to Slackware or something else.