Memory Ordering in Modern Microprocessors, Part II

Anybody who says computers give only right answers hasn't seen what happens when several SMP processors, each with its own cache, try to get at the same data. Here's how to keep the kernel's view of memory correct, no matter what architecture you're on.
zSeries

The zSeries machines make up the IBM mainframe family previously known as the 360, 370 and 390. Parallelism came late to zSeries, but given that these mainframes first shipped in the mid-1960s, this is not saying much. The bcr 15,0 instruction is used for the Linux smp_mb(), smp_rmb() and smp_wmb() primitives. It also has comparatively strong memory-ordering semantics, as shown in Table 1. This should allow the smp_wmb() primitive to be a no-op, and by the time you read this, this change may have happened.

As with most CPUs, the zSeries architecture does not guarantee a cache-coherent instruction stream. Hence, self-modifying code must execute a serializing instruction between updating the instructions and executing them. That said, many actual zSeries machines do in fact accommodate self-modifying code without serializing instructions. The zSeries instruction set provides a large set of serializing instructions, including compare-and-swap, some types of branches—for example, the aforementioned bcr 15,0 instruction—and test-and-set, among others.

Conclusion

This final installment of the memory-barrier series has given an overview of how a number of CPUs implement memory barriers. Although these overviews should by no means be considered a substitute for carefully reading the architecture manuals (see Resources), I hope that it has served as a useful introduction.

Acknowledgements

I owe thanks to many CPU architects for patiently explaining the instruction and memory-reordering features of their CPUs, particularly Wayne Cardoza, Ed Silha, Anton Blanchard, Tim Slegel, Juergen Probst, Ingo Adlung and Ravi Arimilli. Wayne deserves special thanks for his patience in explaining Alpha's reordering of dependent loads, a lesson that I resisted quite strenuously!

Legal Statement

This work represents the view of the author and does not necessarily represent the view of IBM. IBM, zSeries and PowerPC are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries or both. Linux is a registered trademark of Linus Torvalds. i386 is a trademark of Intel Corporation or its subsidiaries in the United States, other countries or both. Other company, product, and service names may be trademarks or service marks of such companies. Copyright (c) 2005 by IBM Corporation.

Resources for this article: /article/8406.

Paul E. McKenney is a Distinguished Engineer with IBM's Linux Technology Center. He has worked on NUMA and SMP algorithms and, in particular, RCU for longer than he cares to admit. In his spare time, he jogs and supports the usual house-wife-and-kids habit.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

memory addressing question

raz ben yehuda's picture

First...Loved your article.

I hope I am not bothering you. But I have a question regarding
memory addressing in Linux.

As I have read ( Mel Gorman's book ) a virtual address in kernel space bellow the first 896 MB is simply an offset PAGE_OFFSET which is stored in the DS register.
So when the cpu wishes to aproach it he substracts this value from the address when he is in kernel mode.

Well if he does, how can the processor tell between a vmalloc virtual
address ( 896 to 1GB) in kernel space to a virtual address in kernel
space ( bellow the 896 MB) ?

Furthermore , If I boot my linux ( An Intel machine, T42 IBM laptop ) using only part of the memory ( boot mem=400M out of 512M) , I would not be able to address addresses above 400 MB .

I tried to memcpy to address above 400 MB and I crashed.
So i realy have no idea where i am wrong.

I would most appreciate your kind help.

Thank you.

Raz

PS.

I am looking for some information/articles regarding how dows the CPU actually approaches the memory.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState