The Linux Process Model

A look at the fundamental building blocks of the Linux kernel.
Threaded Kernel

Linux kernel threading has constantly improved. Let's look at the different versions again:

  • 2.0.x had no kernel threading.

  • 2.2.x has kernel threading added.

  • 2.3.x is very SMP-threaded.

In 2.2.x, many places are still single-threaded, but 2.2.x kernels actually scale well only on two-way SMPs. In 2.2.x, the IRQ/timer handling (for example) is completely SMP-threaded, and the IRQ load is distributed across multiple CPUs.

In 2.3.x, most worthwhile code sections within the kernel are being rewritten for SMP threading. For example, all of the VM (virtual memory) is SMP-threaded. The most interesting paths now have a much finer granularity and scale very well.

Performance Limitations

For the sake of system stability, a kernel has to react well in stress situations. It must, for instance, reduce priorities and resources to processes that misbehave.

How does the scheduler handle a poorly written program looping tightly and forking at each turn of the loop (thereby forking off thousands of processes in a few seconds)? Obviously, the scheduler can't limit the creation of processes time-wise, e.g., a process every 0.5 seconds or similar.

After a fork, however, the “runtime priority” of the process is divided between the parent and the child. This means the parent/child will be penalized compared to the other tasks, and the other tasks will continue to run fine up to the first recalculation of the priorities. This keeps the system from stalling during a fork flooding. The code for this is the concerned code section in linux/kernel/fork.c:

/*
 "share" dynamic priority between parent
 * and child, thus the total amount of dynamic
 * priorities in the system doesn't change, more
 * scheduling fairness. This is only important
 * in the first time slice, in the long run the
 * scheduling behaviour is unchanged.
 */
current->counter >>= 1;
p->counter = current->counter;

Additionally, there is a per-user limit of threads that can be set from init before spawning the first user process. It can be set with  ulimit -u in bash. You can tell it that user moshe can run a maximum ten concurrent tasks (the count includes the shell and every process run by the user).

In Linux, the root user always retains some spare tasks for himself. So, if a user spawns tasks in a loop, the administrator can just log in and use the killall command to remove all tasks of the offending user. Due to the fact that the “runtime priority” of the task is divided between the parent and the child, the kernel reacts smoothly enough to handle this type of situation.

If you wanted to amend the kernel to allow only one fork per processor tick (usually one every 1/100th second; however, this parameter is tunable), called a jiffie, you would have to patch the kernel like this:

--- 2.3.26/kernel/fork.c        Thu Oct 28 22:30:51 1999
+++ /tmp/fork.c Tue Nov  9 01:34:36 1999
@@ -591,6 +591,14 @@
        int retval = -ENOMEM;
        struct task_struct *p;
        DECLARE_MUTEX_LOCKED(sem);
+       static long last_fork;
+
+       while (time_after(last_fork+1, jiffies))
+       {
+               __set_current_state(TASK_INTERRUPTIBLE);
+               schedule_timeout(1);
+       }
+       last_fork = jiffies;
        if (clone_flags & CLONE_PID) {
/* This is only allowed from the boot up thread */

This is the beauty of open source. If you don't like something, just change it!

Here ends the first part of our tour through the Linux kernel. In the next installment, we will take a more detailed look at how the scheduler works. I can promise you some surprising discoveries. Some of these discoveries caused me to revalue completely the probable impact of Linux on the corporate server market. Stay tuned.

Resources

email: moshe@moelabs.com

Moshe Bar (moshe@moelabs.com) is an Israeli system administrator and OS researcher, who started learning UNIX on a PDP-11 with AT&T UNIX Release 6 back in 1981. He holds an M.Sc. in computer science. Visit Moshe's web site at http://www.moelabs.com/.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix