Kernel Locking Techniques
Proper locking can be tough—real tough. Improper locking can result in random crashes and other oddities. Poorly designed locking can result in code that is hard to read, performs poorly and makes your fellow kernel developers cringe. In this article, I explain why kernel code requires locking, provide general rules for proper kernel locking semantics and then outline the various locking primitives in the Linux kernel.
The fundamental issue surrounding locking is the need to provide synchronization in certain code paths in the kernel. These code paths, called critical sections, require some combination of concurrency or re-entrancy protection and proper ordering with respect to other events. The typical result without proper locking is called a race condition. Realize how even a simple i++ is dangerous if i is shared! Consider the case where one processor reads i, then another, then they both increment it, then they both write i back to memory. If i were originally 2, it should now be 4, but in fact it would be 3!
This is not to say that the only locking issues arise from SMP (symmetric multiprocessing). Interrupt handlers create locking issues, as does the new preemptible kernel, and any code can block (go to sleep). Of these, only SMP is considered true concurrency, i.e., only with SMP can two things actually occur at the exact same time. The other situations—interrupt handlers, preempt-kernel and blocking methods—provide pseudo concurrency as code is not actually executed concurrently, but separate code can mangle one another's data.
These critical regions require locking. The Linux kernel provides a family of locking primitives that developers can use to write safe and efficient code.
Whether or not you have an SMP machine, people who use your code may. Further, code that does not handle locking issues properly is typically not accepted into the Linux kernel. Finally, with a preemptible kernel even UP (uniprocessor) systems require proper locking. Thus, do not forget: you must implement locking.
Thankfully, Linus made the excellent design decision of keeping SMP and UP kernels distinct. This allows certain locks not to exist at all in a UP kernel. Different combinations of CONFIG_SMP and CONFIG_PREEMPT compile in varying lock support. It does not matter, however, to the developer: lock everything appropriately and all situations will be covered.
We cover atomic operators initially for two reasons. First, they are the simplest of the approaches to kernel synchronization and thus the easiest to understand and use. Second, the complex locking primitives are built off them. In this sense, they are the building blocks of the kernel's locks. Atomic operators are operations, like add and subtract, which perform in one uninterruptible operation. Consider the previous example of i++. If we could read i, increment it and write it back to memory in one uninterruptible operation, the race condition discussed above would not be an issue. Atomic operators provide these uninterruptible operations. Two types exist: methods that operate on integers and methods that operate on bits. The integer operations work like this:
atomic_t v;
atomic_set(&v, 5); /* v = 5 (atomically) */
atomic_add(3, &v); /* v = v + 3 (atomically) */
atomic_dec(&v); /* v = v - 1 (atomically) */
printf("This will print 7: %d\n", atomic_read(&v));
They are simple. There are, however, little caveats to keep in mind when using atomics. First, you obviously cannot pass an atomic_t to anything but one of the atomic operators. Likewise, you cannot pass anything to an atomic operator except an atomic_t. Finally, because of the limitations of some architectures, do not expect atomic_t to have more than 24 usable bits. See the “Function Reference” Sidebar for a list of all atomic integer operations.
The next group of atomic methods is those that operate on individual bits. They are simpler than the integer methods because they work on the standard C data types. For example, consider void set_bit(int nr, void *addr). This function will atomically set to 1 the “nr-th” bit of the data pointed to by addr. The atomic bit operators are also listed in the “Function Reference” Sidebar.
For anything more complicated than trivial examples like those above, a more complete locking solution is needed. The most common locking primitive in the kernel is the spinlock, defined in include/asm/spinlock.h and include/linux/spinlock.h. The spinlock is a very simple single-holder lock. If a process attempts to acquire a spinlock and it is unavailable, the process will keep trying (spinning) until it can acquire the lock. This simplicity creates a small and fast lock. The basic use of the spinlock is:
spinlock_t mr_lock = SPIN_LOCK_UNLOCKED; unsigned long flags; spin_lock_irqsave(&mr_lock, flags); /* critical section ... */ spin_unlock_irqrestore(&mr_lock, flags);
The use of spin_lock_irqsave() will disable interrupts locally and provide the spinlock on SMP. This covers both interrupt and SMP concurrency issues. With a call to spin_unlock_irqrestore(), interrupts are restored to the state when the lock was acquired. With a UP kernel, the above code compiles to the same as:
unsigned long flags; save_flags(flags); cli(); /* critical section ... */ restore_flags(flags);which will provide the needed interrupt concurrency protection without unneeded SMP protection. Another variant of the spinlock is spin_lock_irq(). This variant disables and re-enables interrupts unconditionally, in the same manner as cli() and sti(). For example:
spinlock_t mr_lock = SPIN_LOCK_UNLOCKED; spin_lock_irq(&mr_lock); /* critical section ... */ spin_unlock_irq(&mr_lock);This code is only safe when you know that interrupts were not already disabled before the acquisition of the lock. As the kernel grows in size and kernel code paths become increasingly hard to predict, it is suggested you not use this version unless you really know what you are doing.
All of the above spinlocks assume the data you are protecting is accessed in both interrupt handlers and normal kernel code. If you know your data is unique to user-context kernel code (e.g., a system call), you can use the basic spin_lock() and spin_unlock() methods that acquire and release the specified lock without any interaction with interrupts.
A final variation of the spinlock is spin_lock_bh() that implements the standard spinlock as well as disables softirqs. This is needed when you have code outside a softirq that is also used inside a softirq. The corresponding unlock function is naturally spin_unlock_bh().
Note that spinlocks in Linux are not recursive as they may be in other operating systems. Most consider this a sane design decision as recursive spinlocks encourage poor code. This does imply, however, that you must be careful not to re-acquire a spinlock you already hold, or you will deadlock.
Spinlocks should be used to lock data in situations where the lock is not held for a long time—recall that a waiting process will spin, doing nothing, waiting for the lock. (See the “Rules” Sidebar for guidelines on what is considered a long time.) Thankfully, spinlocks can be used anywhere. You cannot, however, do anything that will sleep while holding a spinlock. For example, never call any function that touches user memory, kmalloc() with the GFP_KERNEL flag, any semaphore functions or any of the schedule functions while holding a spinlock. You have been warned.
If you need a lock that is safe to hold for longer periods of time, safe to sleep with or capable of allowing concurrency to do more than one process at a time, Linux provides the semaphore.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- Developer Poll
- Tech Tip: Really Simple HTTP Server with Python
- direct cable connection
16 min 18 sec ago - Agreed on AirDroid. With my
26 min 34 sec ago - I just learned this
30 min 44 sec ago - enterprise
1 hour 48 sec ago - not living upto the mobile revolution
3 hours 52 min ago - Deceptive Advertising and
4 hours 27 min ago - Let\'s declare that you have
4 hours 28 min ago - Alterations in Contest Due
4 hours 29 min ago - At a numbers mindset, your
4 hours 30 min ago - Do not get Just Almost any
4 hours 34 min ago




Comments
linux question i need help plz
Build a bash command cgrep that search the indicated files using color, ignoring cases and showing the line number.
Now if you perform the following command: man cgrep
You get the message: no manual entry for cgrep
You are requested to add a manual for this command.
Hints:
1- Read the manual of man (man man) to understand where manual files are stored.
2- You need to use gunzip and gzip
3- You need to be root to create a manual (sudo -i)
Why don't you ask your
Why don't you ask your instructor instead of posting your homework!!!
Re: Kernel Korner: Kernel Locking Techniques
Good article on Locking mechanism in Linux
Thanks for your updates.
Regards,
Sathish.
tasklets and work queue
Hi,
I have a question regarding tasklet & work-queue. As both are bottomhalf handlers, on which basis we should decide to use tasklets or work queue?
I know that tasklets are running at very higher priority (we can say in interrupt context) than work queue (process context) becuase of which we should
not do any blocking/sleep operation inside tasklets while same can be done in workqueue.
If I want to do IO transcation in the response of interrupt, is it good to use tasklets here?
In real scenario,
I got intterupt from touch screen controller, Now I have to read using I2C interface from controller. Is it safe to read data from tasklets here?
Very nicely explained the
Very nicely explained the locking procedure. Very useful URL.
Recursive semaphore
According to the article, spinlocks are not recursive. What about semaphores?
kernel locking techniques
hi
The above information was excellent and i would like to know from you a small information. Can the kernel be completely locked down for a small period of time.i.e none of the kernel threads should run as my thread is running. i would like opinions in this matter
I believe kernel_lock would
I believe kernel_lock would help... If you are in a user space and need kernel for a time being you can make syscall which when called with some parameter calls kernel_lock and returns and when called with some other parameter calls kernel_unlock...
Very good question. I need
Very good question. I need to do this but can't find out how. Has this question been answered somewhere ? A small period would meen 5..90usecs.
some doubt about sempahore
Thank you for you article,I do learn a lot from that.
But a have a question about semphore.
in you article you mention that up() operation:"if the new value is greater than or equal to zero, one or more tasks on the wait queue will be woken up"
i think it's less than or equal to instead of greater than.
sorry but you are wrong
sorry but you are wrong greater than and equal to is written since, as soon as semaphore count increases it means some objects of resource are free to be allocated to some processes.this makes a process pop out of wait queue and become active.
Take the Spinlocks warning seriously!
"never call any function that touches user memory, kmalloc() with the GFP_KERNEL flag, any semaphore functions or any of the schedule functions while holding a spinlock."
I struggled with a kernel panic for a few days when I was calling the function "copy_to_user" while holding a lock. Call the function a few times a second, and it would work, anything higher than that would simply panic.
Just make sure every function called while holding does not sleep. If it has to, use a semaphore.
Re: Kernel Korner: Kernel Locking Techniques
spin lock works on the beauty that it disables the interupt before entering critical section and enble after exititng.So as it cannot disable interrupt of another process , spinlock is not a solution for SMP system.
Of course it is !
spin_lock() won't disable interrupt, it is used to protect between user contexts.
while spin_lock_irq() will disable interrupt, of course it can be used to protect between user context and interrupt context.
Note: the spin_lock_irq() only disable interrupt on _local_ CPU, what can be guaranteed when they use in SMP ?
The answer is the low level assembly code inside, it takes advantage of "BUS locking scheme" to guarantee other CPU won't intervene the access, thus SMP() safe !!
Re: Kernel Korner: Kernel Locking Techniques
atomic_t v;
atomic_set(&v, 5); /* v = 5 (atomically) */
atomic_add(3, &v); /* v = v + 3 (atomically) */
atomic_dec(&v); /* v = v - 1 (atomically) */
printf("This will print 7: %d ", atomic_read(&v));
How does Robert get this example to work in kernel-space? (Did he mean 'printk' instead of 'printf'?)
atomic_t v; atomic_set(&v,
atomic_t v;
atomic_set(&v, 5); /* v = 5 (atomically) */
atomic_add(3, &v); /* v = v + 3 (atomically) */
atomic_dec(&v); /* v = v - 1 (atomically) */
printf("This will print 7: %d ", atomic_read(&v));
As per my knowledge, atomic operations are atomic only for single function like
atomic_add(3,&v);
but not when executed in sequence. i.e.
atomic_add(3,&v); and
{
atomic_add(1,&v);
atomic_add1(2,&v);
}
are not same.
Re: Kernel Korner: Kernel Locking Techniques
Good Article. I have few questions and I appreciate
if you could give me the answers.
1. Can printks exist between spinlock and spinunlock?
2. I understand that it is not possible to have
copy_from_user and copy_to_user calls between
spinlock and spinunlock. Can these functions be
called between semaphore lock and unlock functions
(up and down)?
3. Can down_trylock function be called between spinlock
and spinunlock.
Thanks in advance
Ravi Kumar
Rendezvous On Chip Ltd
Hyderabad
Re: Kernel Korner: Kernel Locking Techniques
See
http://kernelnewbies.org/documents/kdoc/kernel-locking/sleeping-things.html
Anything can be called inside a semaphore
Re: Kernel Korner: Kernel Locking Techniques
--SJLC
Re: Kernel Korner: Kernel Locking Techniques
Thank you very much.
Ravi Kumar
Re: Kernel Korner: Kernel Locking Techniques
If a process attempts to acquire a spinlock and it is unavailable, the process will keep trying (spinning) until it can acquire the lock.
Does this involve task switching? If so, what is the difference
of spinlock and semphere except spinlock wasting more CPU
time?
Re: Kernel Korner: Kernel Locking Techniques
yes you point out rightly.
Actuallly there is no context switch takes place , that is why it is faster than semaphore. As it does not put the process in the wait state so no swithcing takes place.
Re: Kernel Korner: Kernel Locking Techniques
Does this involve task switching? If so, what is the difference
of spinlock and semphere except spinlock wasting more CPU
time?
---------------------------------------------
No. It just spins until it gets the lock.
If the critical region is short enough that the time spent on
spinning around is shorter than that taken to execute the
semaphore up/down codes, the spinlock wins.
If not, you may choose the semaphore.
Re: Kernel Korner: Kernel Locking Techniques
I am pussled........
If spin_locks are not supposed to be hold where processing of data takes long time, i.e. around copy_to_user() which might block. How can I safely move data from the interrupt handler to the user?
Schenario :
Hardware that gives an interrupt when there is data to read.
Interrupt handler intercepts the interrupt and reads the data from the hardware and places it in a "interrupt buffer". The interrupt buffer is not allocated dynamically, but rather
statistically to ensure that it is newer swaped out.
An application reads the data throught the device driver read method. It should not read from the interrupt buffer
directly because the interrupt handler might be adding to the buffer.
First invalid solution that comes to mind: Place a spin lock
around the interrupt buffer so that we guarantee that either the interupt handler or the device driver read method
are accessing the buffer at any given time. This is forbidden since a spin lock should newer be around data processing that migh block, e.g. copy _to_user().
Second invalid solution that comes to mind: Place a spin lock around the interrupt buffer and a semaphore around a user application buffer which is dynamically allocated and a lot bigger than the interrupt buffer. Then we have the problem of interlocking, i.e. in order to move data from the interupt buffer to the application buffer we have to aquire both semphore and spinlock which is now basically preotecting the application buffer which again might be swapped out, i.e. we have the possibility of blocking while holding a spin_lock.
I have a hard time seeing how you can break this "deadlock" in schenario two because you always end up needing to move the data from interrupt context to application context.
KDD
Re: Kernel Korner: Kernel Locking Techniques
Hardware that gives an interrupt when there is data to read. Interrupt handler intercepts the interrupt and reads the data from the hardware and places it in a "interrupt buffer". The interrupt buffer is not allocated dynamically, but rather
statistically to ensure that it is newer swaped out. An application reads the data throught the device driver read method. It should not read from the interrupt buffer directly because the interrupt handler might be adding to the buffer.
In the read() method:
Grab spin_lock_irq
Copy from interrupt buffer to local buffer
spin_unlock_irq
copy_to_user from local buffer
In the interrupt handler:
Grab spin_lock
place data in buffer
spin_unlock
You can also avoid having an interrupt buffer and a different storage buffer - for various reasons. First, why? It is not efficient. Second, all kernel memory is unpagable so you never have to worry... just have a dynamic buffer and have a way for the syscall to read it. Have your spinlock protect the buffer and everyone is happy.
An even better solution might be a double buffer...
Robert Love
questions
Why spin_lock/spin_unlock need to be used in the interrupt handler? Is the read syscall able to preempt the isr?
I believe the spinlock
I believe the spinlock pretects the interrupt buffer from
the same/other ISR executing on other CPUS.
Re: Kernel Korner: Kernel Locking Techniques
Thank you very much for your writing. ^_^
Re: Kernel Korner: Kernel Locking Techniques
Excellent article! Very informative and well-written. Robert Love obviously has hands-on experience of the subject and knows how to share it in a very readable article.
I hope to read more articles from him.
Re: Kernel Korner: Kernel Locking Techniques
Indeed! This was one of the better articles/papers on locking and races I have read for any OS. It makes sense and is very applicable.
I hope to see (many) more articles, too.