Android Low-Memory Killer--In or Out?
One of the jobs of the Linux kernel—and all operating system kernels—is to manage the resources available to the system. When those resources get used up, what should it do? If the resource is RAM, there's not much choice. It's not feasible to take over the behavior of any piece of user software, understand what that software does, and make it more memory-efficient. Instead, the kernel has very little choice but to try to identify the software that is most responsible for using up the system's RAM and kill that process.
The official kernel does this with its OOM (out-of-memory) killer. But, Linux descendants like Android want a little more—they want to perform a similar form of garbage collection, but while the system is still fully responsive. They want a low-memory killer that doesn't wait until the last possible moment to terminate an app. The unspoken assumption is that phone apps are not so likely to run crucial systems like heart-lung machines or nuclear fusion reactors, so one running process (more or less) doesn't really matter on an Android machine.
A low-memory killer did exist in the Linux source tree until recently. It was removed, partly because of the overlap with the existing OOM code, and partly because the same functionality could be provided by a userspace process. And, one element of Linux kernel development is that if something can be done just as well in userspace, it should be done there.
Sultan Alsawaf recently threw open his window, thrust his head out, and shouted, "I'm mad as hell, and I'm not gonna take this anymore!" And, he re-implemented a low-memory killer for the Android kernel. He felt the userspace version was terrible and needed to be ditched. Among other things, he said, it killed too many processes and was too slow. He felt that the technical justification of migrating to the userspace dæmon had not been made clear, and an in-kernel solution was really the way to go.
In Sultan's implementation, the algorithm was simple—if a memory request failed, then the process was killed—no fuss, no muss and no rough stuff.
There was a unified wall of opposition to this patch. So much so that it became clear that Sultan's main purpose was not to submit the patch successfully, but to light a fire under the asses of the people maintaining the userspace version, in hopes that they might implement some of the improvements he wanted.
Michal Hocko articulated his opposition to Sultan's patch very clearly—the Linux kernel would not have two separate OOM killers sitting side by side. The proper OOM killer would be implemented as well as could be, and any low-memory killers and other memory finaglers would have to exist in userspace for particular projects like Android.
Suren Baghdasaryan also was certain that multiple OOM killers in the kernel source tree would be a non-starter. He invited Sultan to approach the problem from the standpoint of improving the user-space low-memory killer instead.
There also were technical problems with Sultan's code. Michal felt it didn't have a broad enough scope and was really good only for a single very specific use case. And, Joel Fernandes agreed that Sultan's approach was too simple. Joel pointed out that "a transient temporary memory spike should not be a signal to kill _any_ process. The reaction to kill shouldn't be so spontaneous that unwanted tasks are killed because the system went into panic mode." Instead, he said, memory usage statistics needed to be averaged out so that a proper judgment of which process to kill could be made. So, the userspace version was indeed slow, but the slowness was by design, so the code could make subtle judgments about how to proceed.
But Suren, on the other hand, agreed that the userspace code could be faster, and that the developers were working on ways to speed it up.
In this way, the discussion gradually transitioned to addressing the deficiencies in the userspace implementation and finding ways to address them. To that extent, Sultan's code provided a benchmark for where the user code would like to be at some point in the future.
It's not unheard of for a developer to implement a whole feature, just to make the point that an existing feature gets it wrong. And in this case, it does seem like that point has been heard.
Note: if you're mentioned above and want to post a response above the comment section, send a message with your response text to [email protected]