Address Space Isolation and the Linux Kernel
Mike Rapoport from IBM launched a bid to implement address space isolation in the Linux kernel. Address space isolation emanates from the idea of virtual memory—where the system maps all its hardware devices' memory addresses into a clean virtual space so that they all appear to be one smooth range of available RAM. A system that implements virtual memory also can create isolated address spaces that are available only to part of the system or to certain processes.
The idea, as Mike expressed it, is that if hostile users find themselves in an isolated address space, even if they find bugs in the kernel that might be exploited to gain control of the system, the system they would gain control over would be just that tiny area of RAM to which they had access. So they might be able to mess up their own local user, but not any other users on the system, nor would they be able to gain access to root level infrastructure.
In fact, Mike posted patches to implement an element of this idea, called System Call Isolation (SCI). This would cause system calls to each run in their own isolated address space. So if, somehow, an attacker were able to modify the return values stored in the stack, there would be no useful location to which to return.
His approach was relatively straightforward. The kernel already maintains a "symbol table" with the addresses of all its functions. Mike's patches would make sure that any return addresses that popped off the stack corresponded to entries in the symbol table. And since "attacks are all about jumping to gadget code which is effectively in the middle of real functions, the jumps they induce are to code that doesn't have an external symbol, so it should mostly detect when they happen."
The problem, he acknowledged, was that implementing this would have a speed hit. He saw no way to perform and enforce these checks without slowing down the kernel. For that reason, Mike said, "it should only be activated for processes or containers we know should be untrusted."
There was not much enthusiasm for this patch. As Jiri Kosina pointed out, Mike's code was incompatible with other security projects like retpolines, which tries to prevent certain types of data leaks falling into an attacker's hands.
There was no real discussion and no interest was expressed in the patch. The combination of the speed hit, the conflict with existing security projects, and the fact that it tried to secure against only hypothetical security holes and not actual flaws in the system, probably combined to make this patch set less interesting to kernel developers.
It's one of the less pleasant aspects of kernel development. Someone can put a lot of hours into a project, with no way to know in advance what objections might be raised at the end. It wouldn't have been obvious to Mike and his colleagues that a speed hit would be necessary. And the possibility of conflict with other existing kernel projects is always very difficult to predict, especially since there often are workarounds that can be discovered only once members of the two projects start debating the various issues in public.
Only Linus Torvalds' general reluctance to add security features that do not address existing security holes could have been predicted. He seems very consistent on that point, much to the annoyance of security-minded developers throughout the Open Source world. The idea of reducing the size of an attack surface seems self-evident to them; while to Linus, it seems self-evident that you shouldn't fix what isn't broken, especially where the fix adds bloat and increases the maintenance costs for the whole project. I think it's likely that even if Jiri and other developers had approved of Mike's patches, Linus might have objected later on.
Note: if you're mentioned in this article and want to send a response, please send a message with your response text to email@example.com, and we'll run it in the next Letters section and post it on the website as an addendum to the original article.