The Linux Process Model

A look at the fundamental building blocks of the Linux kernel.
Threaded Kernel

Linux kernel threading has constantly improved. Let's look at the different versions again:

  • 2.0.x had no kernel threading.

  • 2.2.x has kernel threading added.

  • 2.3.x is very SMP-threaded.

In 2.2.x, many places are still single-threaded, but 2.2.x kernels actually scale well only on two-way SMPs. In 2.2.x, the IRQ/timer handling (for example) is completely SMP-threaded, and the IRQ load is distributed across multiple CPUs.

In 2.3.x, most worthwhile code sections within the kernel are being rewritten for SMP threading. For example, all of the VM (virtual memory) is SMP-threaded. The most interesting paths now have a much finer granularity and scale very well.

Performance Limitations

For the sake of system stability, a kernel has to react well in stress situations. It must, for instance, reduce priorities and resources to processes that misbehave.

How does the scheduler handle a poorly written program looping tightly and forking at each turn of the loop (thereby forking off thousands of processes in a few seconds)? Obviously, the scheduler can't limit the creation of processes time-wise, e.g., a process every 0.5 seconds or similar.

After a fork, however, the “runtime priority” of the process is divided between the parent and the child. This means the parent/child will be penalized compared to the other tasks, and the other tasks will continue to run fine up to the first recalculation of the priorities. This keeps the system from stalling during a fork flooding. The code for this is the concerned code section in linux/kernel/fork.c:

 "share" dynamic priority between parent
 * and child, thus the total amount of dynamic
 * priorities in the system doesn't change, more
 * scheduling fairness. This is only important
 * in the first time slice, in the long run the
 * scheduling behaviour is unchanged.
current->counter >>= 1;
p->counter = current->counter;

Additionally, there is a per-user limit of threads that can be set from init before spawning the first user process. It can be set with  ulimit -u in bash. You can tell it that user moshe can run a maximum ten concurrent tasks (the count includes the shell and every process run by the user).

In Linux, the root user always retains some spare tasks for himself. So, if a user spawns tasks in a loop, the administrator can just log in and use the killall command to remove all tasks of the offending user. Due to the fact that the “runtime priority” of the task is divided between the parent and the child, the kernel reacts smoothly enough to handle this type of situation.

If you wanted to amend the kernel to allow only one fork per processor tick (usually one every 1/100th second; however, this parameter is tunable), called a jiffie, you would have to patch the kernel like this:

--- 2.3.26/kernel/fork.c        Thu Oct 28 22:30:51 1999
+++ /tmp/fork.c Tue Nov  9 01:34:36 1999
@@ -591,6 +591,14 @@
        int retval = -ENOMEM;
        struct task_struct *p;
+       static long last_fork;
+       while (time_after(last_fork+1, jiffies))
+       {
+               __set_current_state(TASK_INTERRUPTIBLE);
+               schedule_timeout(1);
+       }
+       last_fork = jiffies;
        if (clone_flags & CLONE_PID) {
/* This is only allowed from the boot up thread */

This is the beauty of open source. If you don't like something, just change it!

Here ends the first part of our tour through the Linux kernel. In the next installment, we will take a more detailed look at how the scheduler works. I can promise you some surprising discoveries. Some of these discoveries caused me to revalue completely the probable impact of Linux on the corporate server market. Stay tuned.



Moshe Bar ( is an Israeli system administrator and OS researcher, who started learning UNIX on a PDP-11 with AT&T UNIX Release 6 back in 1981. He holds an M.Sc. in computer science. Visit Moshe's web site at