Kernel Korner - Using RCU in the Linux 2.5 Kernel

Read-copy update, a synchronization technique optimized for read-mostly data structures, is new with the 2.5/2.6 kernel and promises better SMP scalability.

When RCU is applied to data structures other than lists, memory-barrier instructions must be specified explicitly. For an example, see Mingming Cao's RCU-based modifications to System V IPC.

How to Use RCU

Although RCU has been used in many interesting and surprising ways, one of the most straightforward uses is as a replacement for reader-writer locking. In fact, RCU may be thought of as the next step in a progression. The rwlock primitives allow readers to run in parallel with each other, but not in parallel with updaters. RCU, on the other hand, allows readers to run in parallel both with each other and with updaters.

This section presents the analogy between rwlock and RCU, protecting the simple doubly linked list data structure shown in Listing 3 with reader-writer locks and then with RCU. This structure has a struct list_head, a search key, a single integer for data and a struct rcu_head.

The reader-writer-lock/RCU analogy substitutes primitives as shown in Table 2. The asterisked primitives are no-ops in non-preemptible kernels; in preemptible kernels, they suppress preemption, which is an extremely cheap operation on the local task structure. Because rcu_read_lock() does not block interrupt contexts, it is necessary to add primitives for this purpose where needed. For example, read_lock_irqsave must become rcu_read_lock(), followed by local_irq_save().

Table 2. Reader-Writer Lock and RCU Primitives

Reader-Writer LockRead-Copy Update
rwlock_tspinlock_t
read_lock()rcu_read_lock() *
read_unlock()rcu_read_unlock() *
write_lock()spin_lock()
write_unlock()spin_unlock()
list_add()list_add_rcu()
list_add_tail()list_add_tail_rcu()
list_del()list_del_rcu()
list_for_each()list_for_each_rcu()

Table 3 illustrates this transformation for some simple linked-list operations. Notice that line 10 of the rwlock delete() function is replaced with a call_rcu() that delays the invocation of my_free() until the end of a grace period. The rest of the functions are transformed in a straightforward fashion, as indicated in Table 2.

Although this analogy can be quite compelling and useful—in Dipankar Sarma's and Maneesh Soni's use of RCU in dcache, for example—there are some caveats:

  • Read-side critical sections may see stale data that has been removed from the list but not yet freed. There are some situations where this is not a problem, such as routing tables for best-effort protocols. In other situations, such stale data may be detected and ignored, as happens in the 2.5 kernel's System V IPC implementation.

  • Read-side critical sections may run concurrently with write-side critical sections.

  • The grace period delays the freeing of memory, which means the memory footprint is somewhat larger when using RCU than it is when using reader-writer locking.

  • When changing to RCU, write-side reader-writer locking code that modifies list elements in place often must be restructured to prevent read-side RCU code from seeing the data in an inconsistent state. In many cases, this restructuring is quite straightforward; for example, you could create a new list element with the desired state, then replace the old element with the new one.

Where it applies, this analogy can deliver full parallelism with hardly any increase in complexity.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Can you explain a little more about smp_wmb?

Anonymous's picture

It seems that smp memory barrier is tightly linked with RCU. Codes taking advantage of RCU also use smp_wmb to deal with weak memory-consistency processes, for example, in the routing cache. So, I think understanding this thing is key to the "USEING" of RCU. Thanks.

Re: Can you explain a little more about smp_wmb?

Anonymous's picture

When you are using the linux/list.h _rcu macros along with normal locking to protect against concurrent updates, the memory barriers are taken care of for you. Many of the places where one can use RCU do involve linked lists, so this works much of the time. However, if you are in a situation where you cannnot use these macros, then you are quite right that an understanding of memory barriers is required.

Modern CPUs are within their rights to reorder operations unless explicitly told not to. Therefore, locking primitives on many CPUs contain memory barriers that prevent (for example) the contents of the critical section from "bleeding out" into the surrounding code. Any such "bleeding" would mean that part of the critical section was no longer protected by the lock, which would result in failure. Hence the memory barriers in locking primitives.

This situation means that lock-free operations require explicit memory barriers. A full explanation is beyond the scope of this comment, but the LKML thread starting with this message and this web page are a place to start. If you want a full treatment, Gharachorloo's Ph.D. thesis is a place to finish.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix