Kernel Korner - Sleeping in the Kernel
In Linux kernel programming, there are numerous occasions when processes wait until something occurs or when sleeping processes need to be woken up to get some work done. There are different ways to achieve these things.
All of the discussion in this article refers to kernel mode execution. A reference to a process means execution in kernel space in the context of that process.
Some kernel code examples have been reformatted to fit this print format. Line numbers refer to lines in the original file.
In Linux, the ready-to-run processes are maintained on a run queue. A ready-to-run process has the state TASK_RUNNING. Once the timeslice of a running process is over, the Linux scheduler picks up another appropriate process from the run queue and allocates CPU power to that process.
A process also voluntarily can relinquish the CPU. The schedule() function could be used by a process to indicate voluntarily to the scheduler that it can schedule some other process on the processor.
Once the process is scheduled back again, execution begins from the point where the process had stopped—that is, execution begins from the call to the schedule() function.
At times, processes want to wait until a certain event occurs, such as a device to initialise, I/O to complete or a timer to expire. In such a case, the process is said to sleep on that event. A process can go to sleep using the schedule() function. The following code puts the executing process to sleep:
sleeping_task = current; set_current_state(TASK_INTERRUPTIBLE); schedule(); func1(); /* The rest of the code */
Now, let's take a look at what is happening in there. In the first statement, we store a reference to this process' task structure. current, which really is a macro, gives a pointer to the executing process' task_structure. set_current_state changes the state of the currently executing process from TASK_RUNNING to TASK_INTERRUPTIBLE. In this case, as mentioned above, the schedule() function simply should schedule another process. But that happens only if the state of the task is TASK_RUNNING. When the schedule() function is called with the state as TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE, an additional step is performed: the currently executing process is moved off the run queue before another process is scheduled. The effect of this is the executing process goes to sleep, as it no longer is on the run queue. Hence, it never is scheduled by the scheduler. And, that is how a process can sleep.
Now let's wake it up. Given a reference to a task structure, the process could be woken up by calling:
wake_up_process(sleeping_task);
As you might have guessed, this sets the task state to TASK_RUNNING and puts the task back on the run queue. Of course, the process runs only when the scheduler looks at it the next time around.
So now you know the simplest way of sleeping and waking in the kernel.
A process can sleep in two different modes, interruptible and uninterruptible. In an interruptible sleep, the process could be woken up for processing of signals. In an uninterruptible sleep, the process could not be woken up other than by issuing an explicit wake_up. Interruptible sleep is the preferred way of sleeping, unless there is a situation in which signals cannot be handled at all, such as device I/O.
Almost always, processes go to sleep after checking some condition. The lost wake-up problem arises out of a race condition that occurs while a process goes to conditional sleep. It is a classic problem in operating systems.
Consider two processes, A and B. Process A is processing from a list, consumer, while the process B is adding to this list, producer. When the list is empty, process A sleeps. Process B wakes A up when it appends anything to the list. The code looks like this:
Process A:
1 spin_lock(&list_lock);
2 if(list_empty(&list_head)) {
3 spin_unlock(&list_lock);
4 set_current_state(TASK_INTERRUPTIBLE);
5 schedule();
6 spin_lock(&list_lock);
7 }
8
9 /* Rest of the code ... */
10 spin_unlock(&list_lock);
Process B:
100 spin_lock(&list_lock);
101 list_add_tail(&list_head, new_node);
102 spin_unlock(&list_lock);
103 wake_up_process(processa_task);
There is one problem with this situation. It may happen that after process A executes line 3 but before it executes line 4, process B is scheduled on another processor. In this timeslice, process B executes all its instructions, 100 through 103. Thus, it performs a wake-up on process A, which has not yet gone to sleep. Now, process A, wrongly assuming that it safely has performed the check for list_empty, sets the state to TASK_INTERRUPTIBLE and goes to sleep.
Thus, a wake up from process B is lost. This is known as the lost wake-up problem. Process A sleeps, even though there are nodes available on the list.
This problem could be avoided by restructuring the code for process A in the following manner:
Process A:
1 set_current_state(TASK_INTERRUPTIBLE);
2 spin_lock(&list_lock);
3 if(list_empty(&list_head)) {
4 spin_unlock(&list_lock);
5 schedule();
6 spin_lock(&list_lock);
7 }
8 set_current_state(TASK_RUNNING);
9
10 /* Rest of the code ... */
11 spin_unlock(&list_lock);
This code avoids the lost wake-up problem. How? We have changed our current state to TASK_INTERRUPTIBLE, before we test the condition. So, what has changed? The change is that whenever a wake_up_process is called for a process whose state is TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE, and the process has not yet called schedule(), the state of the process is changed back to TASK_RUNNING.
Thus, in the above example, even if a wake-up is delivered by process B at any point after the check for list_empty is made, the state of A automatically is changed to TASK_RUNNING. Hence, the call to schedule() does not put process A to sleep; it merely schedules it out for a while, as discussed earlier. Thus, the wake-up no longer is lost.
Here is a code snippet of a real-life example from the Linux kernel (linux-2.6.11/kernel/sched.c: 4254):
4253 /* Wait for kthread_stop */
4254 set_current_state(TASK_INTERRUPTIBLE);
4255 while (!kthread_should_stop()) {
4256 schedule();
4257 set_current_state(TASK_INTERRUPTIBLE);
4258 }
4259 __set_current_state(TASK_RUNNING);
4260 return 0;
This code belongs to the migration_thread. The thread cannot exit until the kthread_should_stop() function returns 1. The thread sleeps while waiting for the function to return 0.
As can be seen from the code, the check for the kthread_should_stop condition is made only after the state is TASK_INTERRUPTIBLE. Hence, the wake-up received after the condition check but before the call to schedule() function is not lost.











This week 5 lucky Members will receive a Root Superhero T-shirt, as modeled by
Hack Editor Kyle Rankin. No entry necessary. Check back here early next week
to find out who the lucky Online Members are.




Comments
Waitqueues in bottom halve context
Can we use wait queue in bottom halve context?
For example, we have LOC in the following source
Kernel Version: Linux-2.6.18.5
Path: \net\xfrm\xfrm_policy.c
Ln: 919
Which calls schedule() function.
This function is called when packet is received and looks for
policy in softirq context.
Please correct me if i misunderstood.
Thanks in advance
Re: Waitqueues in bottom halve context
The function xfrm_lookup() is called from both user context
and kernel(softirq) context.
The process is put to sleep when arguemnt:flag of the above
function is -EAGAIN.
User Context:
If this function is called from user context it may have -EAGAIN
Kernel Context:
If this function is called from kernel context (softirq), it will be NULL.
So the softirq context process will not be put to wait queue.
Thanks,
Sathish Kumar
misprint
In the line:
"This call causes the smbiod to sleep only if the DATA_READY bit is set",
"set" should read "clear", and I think this is a misprint here.
SIGKILL arrival
how will wait_event_interruptible will behave if SIGKILL happens to the user space process that is sleeping using this function? My system gives kernel panic when i do 'kill -SIGKILL pid'. is it because of the reason that im sleeping in kernel and i delivered SIGKILL?
In the schedule function
In the schedule function section, you have mentioned that a way to wake up a process that has just called the schedule function is to call wake_up_process(sleeping_task);
But how does another process have a reference to the task structure of a process that has just called schedule.
In the code you have shown the example to get a reference to the task structure. But that is being done in the process that is calling schedule.
Please elaborate.
Thanks,
Vinay
store it somewhere
It is upto you where/how you want to store that reference to the sleeping process's task struct. You may embed it in the appropriate data structures in your code.
Waking up a sleeping process from user space
What is a good way to wake up a process that is asleep in kernel space from a process that is in user space?
Perhaps a system call which invokes 'wake_up_process'?
The 'kill' system call doesn't seem to work...
schedule_timeout_{,un}interruptible()
A patch that includes two new variants of schedule_timeout() has been included in the -mm kernel tree.
Almost all the calls to schedule_timeout(), set the state of the executing task to TASK_INTERRUPTIBLE / TASK_UNINTERRUPTIBLE. This patch embeds such functionality in the following calls,
schedule_timeout_interruptible() and,
schedule_timeout_uninterruptible()
Error in code samples
There is an error in the code samples. wait_event() and wait_event_interuptible() should not be passed the address of my_event, but my_event itself. That is because they are macros, and their implementations will wind up using the address-of operator (&) to take the address of the parameter they are passed.
Re : Error in code samples
Yes, thanks for that correction. In section "Wait Queues", the illustrations should be like this :
That is because they are
That is because they are macros, and their implementations will wind up using the address-of operator (&) to take the address of the parameter they are passed.mırç mırç Chat chat türkçe mirc türkçe mirc
Post new comment