The Linux Process Model

A look at the fundamental building blocks of the Linux kernel.
Threaded Kernel

Linux kernel threading has constantly improved. Let's look at the different versions again:

  • 2.0.x had no kernel threading.

  • 2.2.x has kernel threading added.

  • 2.3.x is very SMP-threaded.

In 2.2.x, many places are still single-threaded, but 2.2.x kernels actually scale well only on two-way SMPs. In 2.2.x, the IRQ/timer handling (for example) is completely SMP-threaded, and the IRQ load is distributed across multiple CPUs.

In 2.3.x, most worthwhile code sections within the kernel are being rewritten for SMP threading. For example, all of the VM (virtual memory) is SMP-threaded. The most interesting paths now have a much finer granularity and scale very well.

Performance Limitations

For the sake of system stability, a kernel has to react well in stress situations. It must, for instance, reduce priorities and resources to processes that misbehave.

How does the scheduler handle a poorly written program looping tightly and forking at each turn of the loop (thereby forking off thousands of processes in a few seconds)? Obviously, the scheduler can't limit the creation of processes time-wise, e.g., a process every 0.5 seconds or similar.

After a fork, however, the “runtime priority” of the process is divided between the parent and the child. This means the parent/child will be penalized compared to the other tasks, and the other tasks will continue to run fine up to the first recalculation of the priorities. This keeps the system from stalling during a fork flooding. The code for this is the concerned code section in linux/kernel/fork.c:

/*
 "share" dynamic priority between parent
 * and child, thus the total amount of dynamic
 * priorities in the system doesn't change, more
 * scheduling fairness. This is only important
 * in the first time slice, in the long run the
 * scheduling behaviour is unchanged.
 */
current->counter >>= 1;
p->counter = current->counter;

Additionally, there is a per-user limit of threads that can be set from init before spawning the first user process. It can be set with  ulimit -u in bash. You can tell it that user moshe can run a maximum ten concurrent tasks (the count includes the shell and every process run by the user).

In Linux, the root user always retains some spare tasks for himself. So, if a user spawns tasks in a loop, the administrator can just log in and use the killall command to remove all tasks of the offending user. Due to the fact that the “runtime priority” of the task is divided between the parent and the child, the kernel reacts smoothly enough to handle this type of situation.

If you wanted to amend the kernel to allow only one fork per processor tick (usually one every 1/100th second; however, this parameter is tunable), called a jiffie, you would have to patch the kernel like this:

--- 2.3.26/kernel/fork.c        Thu Oct 28 22:30:51 1999
+++ /tmp/fork.c Tue Nov  9 01:34:36 1999
@@ -591,6 +591,14 @@
        int retval = -ENOMEM;
        struct task_struct *p;
        DECLARE_MUTEX_LOCKED(sem);
+       static long last_fork;
+
+       while (time_after(last_fork+1, jiffies))
+       {
+               __set_current_state(TASK_INTERRUPTIBLE);
+               schedule_timeout(1);
+       }
+       last_fork = jiffies;
        if (clone_flags & CLONE_PID) {
/* This is only allowed from the boot up thread */

This is the beauty of open source. If you don't like something, just change it!

Here ends the first part of our tour through the Linux kernel. In the next installment, we will take a more detailed look at how the scheduler works. I can promise you some surprising discoveries. Some of these discoveries caused me to revalue completely the probable impact of Linux on the corporate server market. Stay tuned.

Resources

email: moshe@moelabs.com

Moshe Bar (moshe@moelabs.com) is an Israeli system administrator and OS researcher, who started learning UNIX on a PDP-11 with AT&T UNIX Release 6 back in 1981. He holds an M.Sc. in computer science. Visit Moshe's web site at http://www.moelabs.com/.

______________________

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions