Memory Ordering in Modern Microprocessors, Part II
The first installment of this series was an overview of memory barriers, why they are needed in SMP kernels and how the Linux kernel handles them [August 2005]. This installment gives an overview of how several of the more popular CPUs—Alpha, AMD64, IA64, PA-RISC, POWER, SPARC, x86 and zSeries, otherwise known as IBM mainframe—implement memory barriers. Table 1 is reproduced here from the first installment of this series for reference.
It may seem strange to say much of anything about a CPU whose end of life has been announced, but Alpha is interesting because, with the weakest memory-ordering model, it reorders memory operations the most aggressively. It therefore has defined the Linux kernel memory-ordering primitives that must work on all CPUs. Understanding Alpha, therefore, is surprisingly important to the Linux kernel hacker.
The difference between Alpha and the other CPUs is illustrated by the code shown in Listing 1. This smp_wmb() on line 9 guarantees that the element initialization in lines 6–8 is executed before the element is added to the list on line 10, so that the lock-free search works correctly. That is, it makes this guarantee on all CPUs except Alpha.
Alpha has extremely weak memory ordering, such that the code on line 20 of Listing 1 could see the old garbage values that were present before the initialization on lines 6–8.
Figure 1 shows how this can happen on an aggressively parallel machine with partitioned caches, so that alternating cache lines are processed by the different partitions of the caches. Assume that the list header head is processed by cache bank 0 and the new element is processed by cache bank 1. On Alpha, the smp_wmb() guarantees that the cache invalidation performed by lines 6–8 of Listing 1 reaches the interconnect before that of line 10. But, it makes absolutely no guarantee about the order in which the new values reach the reading CPU's core. For example, it is possible that the reading CPU's cache bank 1 is busy, while cache bank 0 is idle. This could result in the cache invalidates for the new element being delayed, so that the reading CPU gets the new value for the pointer but sees the old cached values for the new element.
One could place an smp_rmb() primitive between the pointer fetch and dereference. However, this imposes unneeded overhead on systems such as x86, IA64, PPC and SPARC that respect data dependencies on the read side. An smp_read_barrier_depends() primitive has been added to the Linux 2.6 kernel to eliminate overhead on these systems. This primitive may be used as shown on line 19 of Listing 2. However, please note that RCU code should use rcu_dereference() instead.
It also is possible to implement a software barrier that could be used in place of smp_wmb(), which would force all reading CPUs to see the writing CPU's writes in order. However, this approach was deemed by the Linux community to impose excessive overhead on extremely weakly ordered CPUs, such as Alpha. This software barrier could be implemented by sending interprocessor interrupts (IPIs) to all other CPUs. Upon receipt of such an IPI, a CPU would execute a memory-barrier instruction, implementing a memory-barrier shoot-down. Additional logic is required to avoid deadlocks. Of course, CPUs that respect data dependencies would define such a barrier simply to be smp_wmb(). Perhaps this decision should be revisited in the future when Alpha fades off into the sunset.
Listing 1. Insert and Lock-Free Search
1 struct el *insert(long key, long data)
2 {
3 struct el *p;
4 p = kmalloc(sizeof(*p), GPF_ATOMIC);
5 spin_lock(&mutex);
6 p->next = head.next;
7 p->key = key;
8 p->data = data;
9 smp_wmb();
10 head.next = p;
11 spin_unlock(&mutex);
12 }
13
14 struct el *search(long key)
15 {
16 struct el *p;
17 p = head.next;
18 while (p != &head) {
19 /* BUG ON ALPHA!!! */
20 if (p->key == key) {
21 return (p);
22 }
23 p = p->next;
24 };
25 return (NULL);
26 }
The Linux memory-barrier primitives took their names from the Alpha instructions, so smp_mb() is mb, smp_rmb() is rmb and smp_wmb() is wmb. Alpha is the only CPU where smp_read_barrier_depends() is an smp_mb() rather than a no-op. For more detail on Alpha, see the reference manual, listed in the on-line Resources.
Trending Topics
| Your CMS Is Not Your Web Site | Feb 01, 2012 |
| Casper, the Friendly (and Persistent) Ghost | Jan 31, 2012 |
| Razor-qt 0.4 - Qt based Desktop Environment | Jan 30, 2012 |
| Using Plop Boot Manager for USB Boot | Jan 25, 2012 |
| Moose | Jan 23, 2012 |
| Basic Chemistry on the GNOME Desktop | Jan 20, 2012 |
- Bad Journal
21 min 37 sec ago - Why isn't it a web site?
30 min 20 sec ago - thanx, i'll give it a try
13 hours 36 min ago - thxxx
14 hours 22 min ago - Really critical exposed. I
14 hours 55 min ago - What is a CMS
1 day 6 hours ago - I Like It!
1 day 8 hours ago - processwire
1 day 11 hours ago - cms VS
1 day 14 hours ago - Crucial content. Although
1 day 14 hours ago







Comments
memory addressing question
First...Loved your article.
I hope I am not bothering you. But I have a question regarding
memory addressing in Linux.
As I have read ( Mel Gorman's book ) a virtual address in kernel space bellow the first 896 MB is simply an offset PAGE_OFFSET which is stored in the DS register.
So when the cpu wishes to aproach it he substracts this value from the address when he is in kernel mode.
Well if he does, how can the processor tell between a vmalloc virtual
address ( 896 to 1GB) in kernel space to a virtual address in kernel
space ( bellow the 896 MB) ?
Furthermore , If I boot my linux ( An Intel machine, T42 IBM laptop ) using only part of the memory ( boot mem=400M out of 512M) , I would not be able to address addresses above 400 MB .
I tried to memcpy to address above 400 MB and I crashed.
So i realy have no idea where i am wrong.
I would most appreciate your kind help.
Thank you.
Raz
PS.
I am looking for some information/articles regarding how dows the CPU actually approaches the memory.