diff -u: What's New in Kernel Development


Boot times can become slow on systems with many CPUs, partly because of the time it takes to crank up all the RAM chips. Mel Gorman recently submitted some patches to start up RAM chips in parallel instead of one after the other. One of the main problems with trying to implement such a feature—and one of the main reasons such patches haven't made it into the kernel before—is the need to avoid slowing things down for smaller systems.

Mel's patches modified the kswapd code to give each CPU its own RAM initialization thread. On smaller systems this theoretically would amount to no change at all, while larger systems could see dramatically reduced boot times.

An initial test by Waiman Long reported a 25% reduction in boot time on his 12 terabyte system—from 404 seconds to 298. And when Peter Zijlstra and Mel asked if this made a worthwhile difference to him, Waiman replied:

Booting 100s faster is certainly something that is nice to have. Right now, more time is spent in the firmware POST portion of the bootup process than in the OS boot. So I would say this patch isn't really critical right now as machines with that much memory are relatively rare. However, if we look forward to the near future, some new memory technology like persistent memory is coming, and machines with large amount of memory (whether persistent or not) will become more common. This patch will certainly be useful if we look forward into the future.

And Scott J. Norton also added, "100 seconds really does matter and is a big deal. When businesses have one of these large machines go down, their business is stopped (unless they have a fast failover solution in place). Every minute and second the machine is down is crucial to these businesses."

There was a bit of a push by Andrew Morton for Mel to simplify his code, but Mel felt that Andrew's suggestions could make things worse, such as forcing the kernel to rely on user-space code. And so long as systems keep getting bigger, patches like these seemed destined for eventual acceptance.

Intel has invited Linux kernel engineers to assist in development for chips that are so new that their in-house developers must code on software simulations of the eventual hardware.

The patches, released by Dave Hansen, wouldn't run for anyone outside Intel—since no one else has those chips—but he was hoping for feedback on their implementation of Memory Protection Keys for user space.

The underlying idea involves utilizing previously unused bits from existing registers and introducing new registers and associated assembler instructions to secure system memory on a page-by-page basis. Essentially, this gives users the ability to enable a particular set of actions on a given set of pages, while prohibiting others.

When Ingo Molnar asked Dave to list some potential use cases for this chip feature, Dave replied that there were various things that a user might want to protect, such as the following: "Data structures like logs or journals that are only written to in very limited code paths, but that you want to protect from 'stray' writes." Or: "a database where a query operation will never need to write to memory, but an insert would. You could keep the data R/O during the entire operation except when an insert is actually in progress."

And, Alan Cox also suggested:

You also can use it for certain types of emulator trickery, and I suspect even for things like interpreters and controlling access to "tainted" values.

Other obvious uses are making it a shade harder for SSL or ssh type errors to leak things like key data by reducing the damage done by out of bound accesses.

Ingo asked if there could be any issues surrounding this feature existing on some CPUs but not others. And Dave replied, "It's always a problem with new CPU features." He then went on to say:

I've thought a bit about trying to "emulate" the feature on older CPUs using good ol' mprotect() so that we could have an API that folks can use today, but that would get magically fast on future CPUs. But, the problem with that is the thread-local aspect.

mprotect() is fundamentally process-wide and protection keys right are fundamentally thread-local. Those things are going to be hard to reconcile unless we do something slightly extreme like having per-thread page tables.

The discussion got technical, but clearly the main question is how to support the new chip features, rather than whether to support them at all.

Luis R. Rodriguez has extended module signing to support signing firmware as well. Eventually, he figures it should be possible to sign user data too. This seemed to be a natural extension of existing features and not very controversial. But, there are certain differences between the firmware signing code and the module signing code; for example, Luis' code introduces separate files to contain the firmware signatures as a means to better handle licensing issues.

Luis' patches also "do not taint the kernel in the permissive [firmware] signing mode due to restrictions on the firmware_class API; extensions to enable this are expected, however, in the future."