"No Reboot" Kernel Patching - And Why You Should Care

 in

As Linux version 4.0 was released on 15 April, one of the most discussed new features to be included in this release is "no reboot" kernel patching. With the major distros committing to support the 4.0 kernel and its features (including "no reboot" patching) at some point this year, it's a good time to take a look at what this feature actually does and what difference it will make for you.

First of all, what does it actually mean? Well, for once, this is a feature with a name that describes what it does pretty well. With versions of Linux before 4.0, when the kernel is updated via a patch, the system needs to reboot.

Kernel patches are released for a number of reasons, but fixing security holes is the most frequent reason. This is why it's important to install the patch as soon as possible.

Unlike other operating systems, Linux is able to update many different parts of the system without a reboot, but the kernel is different. Every running process integrates with the kernel intimately, so switching out parts of the kernel while it is running is quite risky.

On the other hand, rebooting the computer is irksome, and in some cases, where uptime is important, it can be a real issue. This is why "no reboot" kernel patching has been a priority for many administrators.

Recognizing this need, two companies have been hard at work on two different solutions. Red Hat has been working on kpatch, and SUSE has been working on kGraft. Both of these programs are designed to accomplish the same task, but they take a different approach and have different strengths.

Kpatch freezes every process and then reroutes system calls from the old kernel functions to the new, patched functions, before removing the old code. Because it handles every running process in one sweeping move, it runs quite fast - one to forty milliseconds and it's done. However, during this time the processes are frozen, which means there is some downtime - a mere fraction of a second, but in certain situations, that may be unacceptable.

kGraft, on the other hand, handles each thread one by one, as they make system calls (without forcing them to freeze first) until all of the threads are running the patched code. At this point, the patch is fully installed and the old code is replaced. This process takes longer to complete the patch, but it does it without any downtime.

Having solved the same problem separately, from two different angles, the 2 companies then came together in October last year. They looked at how their different approaches could be fused together, and the result of this merge has been pushed into version 4.0 of the kernel.

So, having described what "no reboot" kernel patching is, and how it works, the next question most users will have is "what difference does it make?"

For desktop users, the difference is relatively trivial. For users without 4.0, installing a kernel patch means rebooting the system, which means you must save your work and interrupt your work-flow. This is irritating, and can cause a small hiccup in your productivity. If everyone in a medium or large office has to install a patch on the same day, it hit productivity a bit harder. However, this is a relatively small cost and is worthwhile to ensure security.

On the other hand, some servers and critical real-time applications must not be taken down without advanced scheduling, even for a few minutes. This can be a pain when administrators need to keep the system secure and a patch is released to repair a newly discovered security hole. In this case, no-reboot patching becomes a real boon.

But this doesn't mean that system reboots are gone forever. Even on a system with the Linux 4.0 kernel, there will be security updates that still require a reboot, because there are other non-kernel components that can require patching, and some of these require a reboot as part of the process.

Some critics are therefore claiming that focusing so much effort and time on no-reboot patching is missing the real target that needs fixing - the reason why this feature was developed was to avoid the cost of rebooting a system. Maybe developers should be trying to make it less expensive to reboot a Linux system instead?

______________________