Kernel Locking Techniques
Big-reader locks (brlocks), defined in include/linux/brlock.h, are a specialized form of reader/writer locks. Big-reader locks, designed by Red Hat's Ingo Molnar, provide a spinning lock that is very fast to acquire for reading but incredibly slow to acquire for writing. Therefore, they are ideal in situations where there are many readers and few writers.
While the behavior of brlocks is different from that of rwlocks, their usage is identical with the lone exception that brlocks are predefined in brlock_indices (see brlock.h):
br_read_lock(BR_MR_LOCK); /* critical region (read only) ... */ br_read_unlock(BR_MR_LOCK);
Use of brlocks is currently confined to a few special cases. Due to the large penalty for exclusive write access, it should probably stay that way.
Linux contains a global kernel lock, kernel_flag, that was originally introduced in kernel 2.0 as the only SMP lock. During 2.2 and 2.4, much work went into removing the global lock from the kernel and replacing it with finer-grained localized locks. Today, the global lock's use is minimal. It still exists, however, and developers need to be aware of it.
The global kernel lock is called the big kernel lock or BKL. It is a spinning lock that is recursive; therefore two consecutive requests for it will not deadlock the process (as they would for a spinlock). Further, a process can sleep and even enter the scheduler while holding the BKL. When a process holding the BKL enters the scheduler, the lock is dropped so other processes can obtain it. These attributes of the BKL helped ease the introduction of SMP during the 2.0 kernel series. Today, however, they should provide plenty of reason not to use the lock.
Use of the big kernel lock is simple. Call lock_kernel() to acquire the lock and unlock_kernel() to release it. The routine kernel_locked() will return nonzero if the lock is held, zero if not. For example:
lock_kernel(); /* critical region ... */ unlock_kernel();
Starting with the 2.5 development kernel (and 2.4 with an available patch), the Linux kernel is fully preemptible. This feature allows processes to be preempted by higher-priority processes, even if the current process is running in the kernel. A preemptible kernel creates many of the synchronization issues of SMP. Thankfully, kernel preemption is synchronized by SMP locks, so most issues are solved automatically by writing SMP-safe code. A few new locking issues, however, are introduced. For example, a lock may not protect per-CPU data because it is implicitly locked (it is safe because it is unique to each CPU) but is needed with kernel preemption.
For these situations, preempt_disable() and the corresponding preempt_enable() have been introduced. These methods are nestable such that for each n preempt_disable() calls, preemption will not be re-enabled until the nth preempt_enable() call. See the “Function Reference” Sidebar for a complete list of preemption-related controls.
Both SMP reliability and scalability in the Linux kernel are improving rapidly. Since SMP was introduced in the 2.0 kernel, each successive kernel revision has improved on the previous by implementing new locking primitives and providing smarter locking semantics by revising locking rules and eliminating global locks in areas of high contention. This trend continues in the 2.5 kernel. The future will certainly hold better performance.
Kernel developers should do their part by writing code that implements smart, sane, proper locking with an eye to both scalability and reliability.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- Synchronize Your Life with ownCloud
- Days Between Dates?
- Tech Tip: Really Simple HTTP Server with Python
- A GUI for Your CLI?
- An Introduction to OpenGL Programming
- Cooking with Linux - Serious Cool, Sysadmin Style!
- The Only Mac I Use
- RSS Feeds
- Ubuntu & SUSE & CentOS, Oh My!
- Returning Values from Bash Functions