Kernel Locking Techniques

Robert explains the various locking primitives in the Linux kernel, why you need them and how kernel developers can use them to write safe code.
Semaphores

Semaphores in Linux are sleeping locks. Because they cause a task to sleep on contention, instead of spin, they are used in situations where the lock-held time may be long. Conversely, since they have the overhead of putting a task to sleep and subsequently waking it up, they should not be used where the lock-held time is short. Since they sleep, however, they can be used to synchronize user contexts whereas spinlocks cannot. In other words, it is safe to block while holding a semaphore.

In Linux, semaphores are represented by a structure, struct semaphore, which is defined in include/asm/semaphore.h. The structure contains a pointer to a wait queue and a usage count. The wait queue is a list of processes blocking on the semaphore. The usage count is the number of concurrently allowed holders. If it is negative, the semaphore is unavailable and the absolute value of the usage count is the number of processes blocked on the wait queue. The usage count is initialized at runtime via sema_init(), typically to 1 (in which case the semaphore is called a mutex).

Semaphores are manipulated via two methods: down (historically P) and up (historically V). The former attempts to acquire the semaphore and blocks if it fails. The later releases the semaphore, waking up any tasks blocked along the way.

Semaphore use is simple in Linux. To attempt to acquire a semaphore, call the down_interruptible() function. This function decrements the usage count of the semaphore. If the new value is less than zero, the calling process is added to the wait queue and blocked. If the new value is zero or greater, the process obtains the semaphore and the call returns 0. If a signal is received while blocking, the call returns -EINTR and the semaphore is not acquired.

The up() function, used to release a semaphore, increments the usage count. If the new value is greater than or equal to zero, one or more tasks on the wait queue will be woken up:

struct semaphore mr_sem;
sema_init(&mr_sem, 1);      /* usage count is 1 */
if (down_interruptible(&mr_sem))
  /* semaphore not acquired; received a signal ... */
/* critical region (semaphore acquired) ... */
up(&mr_sem);

The Linux kernel also provides the down() function, which differs in that it puts the calling task into an uninterruptible sleep. A signal received by a process blocked in uninterruptible sleep is ignored. Typically, developers want to use down_interruptible(). Finally, Linux provides the down_trylock() function, which attempts to acquire the given semaphore. If the call fails, down_trylock() will return nonzero instead of blocking.

Reader/Writer Locks

In addition to the standard spinlock and semaphore implementations, the Linux kernel provides reader/writer variants that divide lock usage into two groups: reading and writing. Since it is typically safe for multiple threads to read data concurrently, so long as nothing modifies the data, reader/writer locks allow multiple concurrent readers but only a single writer (with no concurrent readers). If your data access naturally divides into clear reading and writing patterns, especially with a greater amount of reading than writing, the reader/writer locks are often preferred.

The reader/writer spinlock is called an rwlock and is used similarly to the standard spinlock, with the exception of separate reader/writer locking:

rwlock_t mr_rwlock = RW_LOCK_UNLOCKED;
read_lock(&mr_rwlock);
/* critical section (read only) ... */
read_unlock(&mr_rwlock);
write_lock(&mr_rwlock);
/* critical section (read and write) ... */
write_unlock(&mr_rwlock);

Likewise, the reader/writer semaphore is called an rw_semaphore and use is identical to the standard semaphore, plus the explicit reader/writer locking:

struct rw_semaphore mr_rwsem;
init_rwsem(&mr_rwsem);
down_read(&mr_rwsem);
/* critical region (read only) ... */
up_read(&mr_rwsem);
down_write(&mr_rwsem);
/* critical region (read and write) ... */
up_write(&mr_rwsem);
Use of reader/writer locks, where appropriate, is an appreciable optimization. Note, however, that unlike other implementations reader locks cannot be automatically upgraded to the writer variant. Therefore, attempting to acquire exclusive access while holding reader access will deadlock. Typically, if you know you will need to write eventually, obtain the writer variant of the lock from the beginning. Otherwise, you will need to release the reader lock and re-acquire the lock as a writer. If the distinction between code that writes and reads is muddled such as this, it may be indicative that reader/writer locks are not the best choice.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

linux question i need help plz

Anonymous's picture

Build a bash command cgrep that search the indicated files using color, ignoring cases and showing the line number.
Now if you perform the following command: man cgrep
You get the message: no manual entry for cgrep

You are requested to add a manual for this command.

Hints:
1- Read the manual of man (man man) to understand where manual files are stored.
2- You need to use gunzip and gzip
3- You need to be root to create a manual (sudo -i)

Why don't you ask your

Anonymous's picture

Why don't you ask your instructor instead of posting your homework!!!

Re: Kernel Korner: Kernel Locking Techniques

Sathish Kumar's picture

Good article on Locking mechanism in Linux
Thanks for your updates.

Regards,
Sathish.

tasklets and work queue

Maulik Patel's picture

Hi,

I have a question regarding tasklet & work-queue. As both are bottomhalf handlers, on which basis we should decide to use tasklets or work queue?

I know that tasklets are running at very higher priority (we can say in interrupt context) than work queue (process context) becuase of which we should
not do any blocking/sleep operation inside tasklets while same can be done in workqueue.

If I want to do IO transcation in the response of interrupt, is it good to use tasklets here?

In real scenario,

I got intterupt from touch screen controller, Now I have to read using I2C interface from controller. Is it safe to read data from tasklets here?

Very nicely explained the

Anonymous's picture

Very nicely explained the locking procedure. Very useful URL.

Recursive semaphore

Maitre Bart's picture

According to the article, spinlocks are not recursive. What about semaphores?

kernel locking techniques

jinu joy's picture

hi
The above information was excellent and i would like to know from you a small information. Can the kernel be completely locked down for a small period of time.i.e none of the kernel threads should run as my thread is running. i would like opinions in this matter

I believe kernel_lock would

Anonymous's picture

I believe kernel_lock would help... If you are in a user space and need kernel for a time being you can make syscall which when called with some parameter calls kernel_lock and returns and when called with some other parameter calls kernel_unlock...

Very good question. I need

Anonymous's picture

Very good question. I need to do this but can't find out how. Has this question been answered somewhere ? A small period would meen 5..90usecs.

some doubt about sempahore

Anonymous's picture

Thank you for you article,I do learn a lot from that.
But a have a question about semphore.
in you article you mention that up() operation:"if the new value is greater than or equal to zero, one or more tasks on the wait queue will be woken up"
i think it's less than or equal to instead of greater than.

sorry but you are wrong

nishant's picture

sorry but you are wrong greater than and equal to is written since, as soon as semaphore count increases it means some objects of resource are free to be allocated to some processes.this makes a process pop out of wait queue and become active.

Take the Spinlocks warning seriously!

Padam J Singh's picture

"never call any function that touches user memory, kmalloc() with the GFP_KERNEL flag, any semaphore functions or any of the schedule functions while holding a spinlock."

I struggled with a kernel panic for a few days when I was calling the function "copy_to_user" while holding a lock. Call the function a few times a second, and it would work, anything higher than that would simply panic.

Just make sure every function called while holding does not sleep. If it has to, use a semaphore.

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

spin lock works on the beauty that it disables the interupt before entering critical section and enble after exititng.So as it cannot disable interrupt of another process , spinlock is not a solution for SMP system.

Of course it is !

Tom J.P. Sun's picture

spin_lock() won't disable interrupt, it is used to protect between user contexts.
while spin_lock_irq() will disable interrupt, of course it can be used to protect between user context and interrupt context.
Note: the spin_lock_irq() only disable interrupt on _local_ CPU, what can be guaranteed when they use in SMP ?
The answer is the low level assembly code inside, it takes advantage of "BUS locking scheme" to guarantee other CPU won't intervene the access, thus SMP() safe !!

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

atomic_t v;

atomic_set(&v, 5); /* v = 5 (atomically) */
atomic_add(3, &v); /* v = v + 3 (atomically) */
atomic_dec(&v); /* v = v - 1 (atomically) */
printf("This will print 7: %d ", atomic_read(&v));

How does Robert get this example to work in kernel-space? (Did he mean 'printk' instead of 'printf'?)

atomic_t v; atomic_set(&v,

tushar@mwti.net's picture

atomic_t v;

atomic_set(&v, 5); /* v = 5 (atomically) */
atomic_add(3, &v); /* v = v + 3 (atomically) */
atomic_dec(&v); /* v = v - 1 (atomically) */
printf("This will print 7: %d ", atomic_read(&v));

As per my knowledge, atomic operations are atomic only for single function like
atomic_add(3,&v);
but not when executed in sequence. i.e.
atomic_add(3,&v); and
{
atomic_add(1,&v);
atomic_add1(2,&v);
}
are not same.

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Good Article. I have few questions and I appreciate

if you could give me the answers.

1. Can printks exist between spinlock and spinunlock?

2. I understand that it is not possible to have

copy_from_user and copy_to_user calls between

spinlock and spinunlock. Can these functions be

called between semaphore lock and unlock functions

(up and down)?

3. Can down_trylock function be called between spinlock

and spinunlock.

Thanks in advance

Ravi Kumar

Rendezvous On Chip Ltd

Hyderabad

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture
  1. yes, printk just copies the data into a kernel ring buffer, no connection to userspace
  2. yes, the purpose of semaphores is to allow sleep instead of busy waiting for a lock, so they are safe in code that might sleep
  3. I don't know what down_trylock does, maybe someone else can answer

--SJLC

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Thank you very much.

Ravi Kumar

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

If a process attempts to acquire a spinlock and it is unavailable, the process will keep trying (spinning) until it can acquire the lock.

Does this involve task switching? If so, what is the difference

of spinlock and semphere except spinlock wasting more CPU

time?

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

yes you point out rightly.
Actuallly there is no context switch takes place , that is why it is faster than semaphore. As it does not put the process in the wait state so no swithcing takes place.

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Does this involve task switching? If so, what is the difference

of spinlock and semphere except spinlock wasting more CPU

time?

---------------------------------------------

No. It just spins until it gets the lock.

If the critical region is short enough that the time spent on

spinning around is shorter than that taken to execute the

semaphore up/down codes, the spinlock wins.

If not, you may choose the semaphore.

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

I am pussled........

If spin_locks are not supposed to be hold where processing of data takes long time, i.e. around copy_to_user() which might block. How can I safely move data from the interrupt handler to the user?

Schenario :

Hardware that gives an interrupt when there is data to read.

Interrupt handler intercepts the interrupt and reads the data from the hardware and places it in a "interrupt buffer". The interrupt buffer is not allocated dynamically, but rather

statistically to ensure that it is newer swaped out.

An application reads the data throught the device driver read method. It should not read from the interrupt buffer

directly because the interrupt handler might be adding to the buffer.

First invalid solution that comes to mind: Place a spin lock

around the interrupt buffer so that we guarantee that either the interupt handler or the device driver read method

are accessing the buffer at any given time. This is forbidden since a spin lock should newer be around data processing that migh block, e.g. copy _to_user().

Second invalid solution that comes to mind: Place a spin lock around the interrupt buffer and a semaphore around a user application buffer which is dynamically allocated and a lot bigger than the interrupt buffer. Then we have the problem of interlocking, i.e. in order to move data from the interupt buffer to the application buffer we have to aquire both semphore and spinlock which is now basically preotecting the application buffer which again might be swapped out, i.e. we have the possibility of blocking while holding a spin_lock.

I have a hard time seeing how you can break this "deadlock" in schenario two because you always end up needing to move the data from interrupt context to application context.

KDD

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Hardware that gives an interrupt when there is data to read. Interrupt handler intercepts the interrupt and reads the data from the hardware and places it in a "interrupt buffer". The interrupt buffer is not allocated dynamically, but rather
statistically to ensure that it is newer swaped out. An application reads the data throught the device driver read method. It should not read from the interrupt buffer directly because the interrupt handler might be adding to the buffer.

In the read() method:
Grab spin_lock_irq
Copy from interrupt buffer to local buffer
spin_unlock_irq
copy_to_user from local buffer

In the interrupt handler:
Grab spin_lock
place data in buffer
spin_unlock

You can also avoid having an interrupt buffer and a different storage buffer - for various reasons. First, why? It is not efficient. Second, all kernel memory is unpagable so you never have to worry... just have a dynamic buffer and have a way for the syscall to read it. Have your spinlock protect the buffer and everyone is happy.

An even better solution might be a double buffer...

Robert Love

questions

hcey's picture

Why spin_lock/spin_unlock need to be used in the interrupt handler? Is the read syscall able to preempt the isr?

I believe the spinlock

Anonymous's picture

I believe the spinlock pretects the interrupt buffer from
the same/other ISR executing on other CPUS.

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Thank you very much for your writing. ^_^

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Excellent article! Very informative and well-written. Robert Love obviously has hands-on experience of the subject and knows how to share it in a very readable article.

I hope to read more articles from him.

Re: Kernel Korner: Kernel Locking Techniques

Anonymous's picture

Indeed! This was one of the better articles/papers on locking and races I have read for any OS. It makes sense and is very applicable.

I hope to see (many) more articles, too.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix