Shrinking Linux Attack Surfaces
Often, a kernel developer will try to reduce the size of an attack surface against Linux, even if it can't be closed entirely. It's generally a toss-up whether such a patch makes it into the kernel. Linus Torvalds always prefers security patches that really close a hole, rather than just give attackers a slightly harder time of it.
Matthew Garrett recognized that userspace applications might have secret data that might be sitting in RAM at any given time, and that those applications might want to wipe that data clean so no one could look at it.
There were various ways to do this already in the kernel, as Matthew pointed out. An
application could use
mlock() to prevent its memory contents from being pushed into
swap, where it might be read more easily by attackers. An application also could use
atexit() to cause its memory to be thoroughly overwritten when the application exited,
thus leaving no secret data in the general pool of available RAM.
The problem, Matthew pointed out, came if an attacker was able to reboot the system at a critical moment—say, before the user's data could be safely overwritten. If attackers then booted into a different OS, they might be able to examine the data still stored in RAM, left over from the previously running Linux system.
As Matthew also noted, the existing way to prevent even that was to tell the UEFI firmware to wipe system memory before booting to another OS, but this would dramatically increase the amount of time it took to reboot. And if the good guys had won out over the attackers, forcing them to wait a long time for a reboot could be considered a denial of service attack—or at least downright annoying.
Ideally, Matthew said, if the attackers were only able to induce a clean shutdown—not simply a cold boot—then there needed to be a way to tell Linux to scrub all data out of RAM, so there would be no further need for UEFI to handle it, and thus no need for a very long delay during reboot.
Matthew explained the reasoning behind his patch. He said:
Unfortunately, if an application exits uncleanly, its secrets may still be present in RAM. This can't be easily fixed in userland (eg, if the OOM killer decides to kill a process holding secrets, we're not going to be able to avoid that), so this patch adds a new flag to madvise() to allow userland to request that the kernel clear the covered pages whenever the page reference count hits zero. Since vm_flags is already full on 32-bit, it will only work on 64-bit systems.
Matthew Wilcox liked this plan and offered some technical suggestions for Matthew G's patch, and Matthew G posted an updated version in response.
Michal Hocko also had some technical suggestions, including the idea that the patch should not just wipe RAM, but also any swap space, for added protection.
But, Christopher Lameter replied to Matthew G's patch, saying that it didn't actually fix the problem, even if it made the attack more difficult to carry out. As he put it:
The pages are cleared anyways when reallocated to another process. This just clears it sooner before reuse. So it will reduce the time that a page contains the secret sauce in case the program is aborted and cannot run its exit handling.
Is that really worth extending system calls and adding kernel handling for this? Maybe the answer is yes given our current concern about anything related to "security".
Matthew G pointed out that if the system was mostly idle, no other process might claim the RAM that still held secret data. In this case, those secrets would sit unguarded. And if someone did reboot the system at that time, the secret data would be exposed.
A bunch of people contributed technical suggestions, and Matthew G submitted several new versions of his patch, before the discussion ended.
There's clearly some interest in this patch, but no one was singing about it on their way to the Grey Havens. It clearly represents a security improvement, in the sense that it makes the time window a bit tighter for an attacker to take advantage of exposed data, but at the same time, that window does remain open for a certain amount of time. Hostile attackers could potentially take advantage of that to gain access to privileged data, even with Matthew G's patch. It's unclear to me whether or not this patch will go into the kernel.
Note: if you're mentioned above and want to post a response above the comment section, send a message with your response text to firstname.lastname@example.org.