Crazy Compiler Optimizations

Kernel development is always strange. Andrea Parri recently posted a patch to change the order of memory reads during multithreaded operation, such that if one read depended upon the next, the second could not actually occur before the first.

The problem with this was that the bug never could actually occur, and the fix made the kernel's behavior less intuitive for developers. Peter Zijlstra, in particular, voted nay to this patch, saying it was impossible to construct a physical system capable of triggering the bug in question.

And although Andrea agreed with this, he still felt the bug was worth fixing, if only for its theoretical value. Andrea figured, a bug is a bug is a bug, and they should be fixed. But Peter objected to having the kernel do extra work to handle conditions that could never arise. He said, "what I do object to is a model that's weaker than any possible sane hardware."

Will Deacon sided with Peter on this point, saying that the underlying hardware behaved a certain way, and the kernel's current behavior mirrored that way. He remarked, "the majority of developers are writing code with the underlying hardware in mind and so allowing behaviours in the memory model which are counter to how a real machine operates is likely to make things more confusing, rather than simplifying them!"

Still, there were some developers who supported Andrea's patch. Alan Stern, in particular, felt that it made sense to fix bugs when they were found, but that it also made sense to include a comment in the code, explaining the default behavior and the rationale behind the fix, even while acknowledging the bug never could be triggered.

But, Andrea wasn't interested in forcing his patch through the outstretched hands of objecting developers. He was happy enough to back down, having made his point.

It was actually Paul McKenney, who had initially favored Andrea's patch and had considered sending it up to Linus Torvalds for inclusion in the kernel, who identified some of the deeper and more disturbing issues surrounding this whole debate. Apparently, it cuts to the core of the way kernel code is actually compiled into machine language. Paul said:

We had some debates about this sort of thing at the C++ Standards Committee meeting last week.

Pointer provenance and concurrent algorithms, though for once not affecting RCU! We might actually be on the road to a fix that preserves the relevant optimizations while still allowing most (if not all) existing concurrent C/C++ code to continue working correctly. (The current thought is that loads and stores involving inline assembly, C/C++ atomics, or volatile get their provenance stripped. There may need to be some other mechanisms for plain C-language loads and stores in some cases as well.)

But if you know of any code in the Linux kernel that needs to compare pointers, one of which might be in the process of being freed, please do point me at it. I thought that the smp_call_function() code fit, but it avoids the problem because only the sending CPU is allowed to push onto the stack of pending smp_call_function() invocations.

That same concurrent linked stack pattern using cmpxchg() to atomically push and xchg() to atomically pop the full list -would- have this problem. The old pointer passed to cmpxchg() might reference an object that was freed between the time that the old pointer was loaded and the time that the cmpxchg() executed. One way to avoid this is to do the push operation in an RCU read-side critical section and use kfree_rcu() instead of kfree(). Of course, code in the idle loop or that might run on offline CPUs cannot use RCU, plus some data structures are not happy with kfree_rcu() delays, so...

In other words, the issue of how the C compiler should treat pointers depends to some extent on whether they are pointers at all. There's nothing about a pointer that distinguishes it from any other number, except that the compiler knows it's a pointer and can therefore do certain things with it that wouldn't make sense in other contexts. It's this issue of the origins of a number—that is, their provenance—that the standards committee was trying to resolve. The reason any of this is useful and relevant is that the compiler can only optimize code to be faster and more efficient if it can understand what's happening and what's going to happen.

Peter poked around online until he found a paper describing the situation in detail.

It horrified him. His conclusion was, "that's all massive bong-hits. That's utterly insane. Even the proposed semantics are crazy."

Paul did not dissent from that view, though obviously more efficient code is better than less efficient code, and the compiler should go to whatever extremes it can manage to achieve it.

Paul said that none of this was new. In fact, it all dated back 20 years and more to the relatively early days of multithreaded operation. There were, Paul said, a variety of approaches, and he said he hoped to be able to show the kernel folks some of what the GCC folks were thinking on the matter to get feedback and suggestions.

Peter still was a bit freaked out by the situation. In particular, he was concerned about whether the compiler could produce reliable code at all. He remarked, "at the very least we should get this fixed and compile a kernel with the fixed compiler to see what (if anything) changes in the generated code and analyze the changes (if any) to make sure we were ok (or not)."

The GNU C compiler is definitely filled with insanity. The whole question of how to convert C code into the best possible machine code is one that can never fully be answered—and in fact, the question continually changes as new CPUs come out on the market. Not to mention that the compiler also has to work around processor-specific security flaws like the ones plaguing Intel chips in recent years.

Add to this the fact that GCC needs to produce good code not just for the Linux kernel, but for any coding project that someone might dream up. So GCC has to remain both highly specialized and highly generalized at the same time. It makes perfect sense that its dark innards would be dark and innardly.

Note: if you're mentioned above and want to post a response above the comment section, send a message with your response text to

Zack Brown is a tech journalist at Linux Journal and Linux Magazine, and is a former author of the "Kernel Traffic" weekly newsletter and the "Learn Plover" stenographic typing tutorials. He first installed Slackware Linux in 1993 on his 386 with 8 megs of RAM and had his mind permanently blown by the Open Source community. He is the inventor of the Crumble pure strategy board game, which you can make yourself with a few pieces of cardboard. He also enjoys writing fiction, attempting animation, reforming Labanotation, designing and sewing his own clothes, learning French and spending time with friends'n'family.

Load Disqus comments