diff -u: What's New in Kernel Development


Linux capabilities are one of the more fluid and less defined regions of kernel development. Linus Torvalds typically has no trouble violating POSIX standards if he sees a better way of doing something. In the case of filesystem capabilities, however, there's no standard to violate. The best we've got is a POSIX draft document that was discarded before becoming official. So really, anyone with a good idea can come along and make big changes in that part of the kernel.

Filesystem capabilities refer to a finer-grained set of permissions than the traditional choice between running something as a regular user or running as root.

Recently, Eric W. Biederman and Andy Lutomirski found themselves tackling filesystem capabilities from opposite directions. Eric wanted to allow a process that's been granted one set of capabilities to invoke system calls using an even more constrained set of capabilities. Presumably, the goal would be to increase security by preventing system calls from being abused for nefarious purposes. And, Andy wanted to allow one process to allow a completely separate process to perform system calls on its behalf. This might allow the formation of system call services to centralize all system call usage and make it easier to secure those uses.

The discussion went round and round. Eric's idea, as he later clarified, was actually a bit broader than it appeared at first glance—he wanted to convert the Linux implementation of POSIX capabilities into "Real Capabilities". The term Real Capabilities refers to a computer science concept that pre-dates POSIX capabilities. It refers to the idea of giving a process some sort of token that allows it to perform a specified action on a specified object.

Ultimately, nothing about capabilities, or any new patches in that area, can have real clarity until they go into the kernel. Before then, there's always the possibility that they'll violate something important or aim in the wrong direction. But, it's cool to watch Eric and Andy, and lots of other folks, trying to figure it out.

One recurring problem with Linux is the need for backward compatibility. Actually, this affects virtually the whole Open Source world, but Linus Torvalds takes a particular strict stance on the issue with regard to the Linux kernel. If there's a compiled piece of user software out in the wild that relies on a kernel feature, even a dumb kernel feature, then future kernels have got to support that feature, so that the piece of user software will continue to run after a kernel upgrade.

It makes sense. But as Andy Lutomirski recently said, the result was a batch of features that existed only to support old programs. And by carrying these features perpetually into the future, he said, newer software ran the risk of accidentally using those features or even becoming reliant on them.

He proposed allowing new software to turn off those compatibility features explicitly, but that turned out to be more complex than he'd originally thought. Specifically, one of his cornerstone ideas—granting the ability of newer software to turn off the vsyscall page—was not easy to arrange. Andy's initial idea was to have the compiler identify at compile time software that used newer versions of libc, and then have that piece of software elect to disable vsyscall at runtime. But, he didn't see a good way to accomplish that, and Brian Gerst pointed out that vsyscall was globally shared and couldn't be shut off for individual processes.

This actually turned out not to be 100% true. Although Andy agreed that vsyscall was shared globally, the mechanisms to execute it were all emulated in the kernel, and those could be disabled on a per-process basis.

Rich Felker proposed another workaround for vsyscall's global availability. He said the kernel could simply unmap all means of executing vsyscall. Any older software that tried to access it would generate a page fault, which the kernel could then catch and emulate vsyscall just for that program.

But, Andy didn't go for that idea. He said that modern instrumentation tools might want to read the targets of calls, and a page fault would prevent that. Any vsyscall solution, Andy said, had to maintain compatibility for those tools.

On the other hand, as Rich said, those modern tools might never be used on ancient binaries in practice. And even if they were, it might be possible to code up specific kernel workarounds for each use case in a less invasive way than trying to come up with a complete solution for vsyscall.

It's a robust debate, complicated by the fact that it's difficult to know for sure if anyone is actually running old binaries that depend on this or that kernel compatibility feature. But, Andy made it clear that cleaning out compatibility features was not really his primary goal, so much as it was to eliminate potential security holes. Apparently, Google's Project Zero had identified more exploits: http://googleprojectzero.blogspot.com/2015/08/three-bypasses-and-fix-for-one-of.html.

The Linux framebuffer, once a bastion of innovation, is now on the chopping block, in favor of the Direct Rendering Manager (DRM) subsystem. The fbdev maintainer, Tomi Valkeinen, has asked everyone to stop submitting new fbdev drivers and to aim their efforts at DRM instead.

It was not as uncontroversial as you might think. It turned out, as folks like Thomas Petazzoni said, that it still was easier to write very simple drivers for fbdev, than it was for DRM. Just in terms of lines of code, Geert Uytterhoeven noticed that the simplest fbdev drivers were just a few hundred lines of code, while the simplest DRM drivers might require a couple thousand.

No one argued that this would be a permanent problem. If anything, the discussion highlighted the need to write some supporting libraries for DRM and help speed up the ultimate elimination of fbdev.