The AMD AGP Linux Kernel Patch

AMD software architect Richard Brunner explains how the bug came about in the Linux kernel when advanced speculative caching collides with AGP graphics, and he offers a patch to solve it:

AMD has been working with Andrea Arcangeli, Andi Kleen and Dave Jones from SuSE in researching what looks like a cache-attribute conflict bug in the Linux kernel that is being exposed by newer versions of AMD's Athlon processors (AthlonXP and AthlonMP). The good news is that a short-term fix is easy to do, and several long-term fixes can do even better.

The x86 architecture allows a number of important performance optimizations for memory marked as write-back cacheable. One such important optimization allows the processor to speculatively read memory and cache it. Newer AMD processors, such as AthlonXP and AthlonMP, take advantage of aggressive write-back cacheable optimizations for speculated read-for-ownership accesses (RFO). The instant the Linux AGPGART driver maps an allocated physical DRAM page to an aperture physical page, a cache-attribute conflict occurs. But data corruption does not typically occur until a graphics-oriented thread starts writing to the graphics aperture. This causes data corruption because not all software and processors see the memory as write-back, so cache-coherency protocols can't help ensure correctness.

Theoretically, there are numerous cases where this conflict can occur in the kernel. However, AMD has seen the majority of occurrences across Linux kernels and AMD systems when an AGP card is present and the AGPGART driver requests a page from the kernel to map into the Graphics Aperture. Note that the problem is not solved if the AGPGART driver flushes the caches of all processors when allocating the page from the kernel. Simply avoiding the use of 4MB pages (by using a boot line option of mem="nopentium") also does not solve the problem.

A simple, short-term solution that does not cause major changes in the kernel requires "constraining" the Athlon speculation logic by a simple patch to arch/i386/kernel/setup.c. AMD has done rigorous testing of a kernel patch and has seen no failures to date. We are doing some more testing but believe the patch is ready for the light of day. It is at

The best solution involves fixing the way the kernel provides pages to allocators that need non-standard cache attributes and fixing the request size of those allocators.  Rather than allocating a page-at-a-time, the allocators should request as much as possible in one call and return memory from these bigger clusters. An experimental patch to do all of this in the 2.4 kernel is available.

AMD worked behind the scenes with key kernel developers and Linux distributions to minimize the impact of this kernel bug. We hope that giving lots of details on-line, with patches ready to go, is the right way to let the community know about this.