Breaking through the Maximum Process Number
Process management is the most import part of an operating system. Its design and implementation can greatly affect performance. In a multiprocess OS, many processes run simultaneously, thus increasing the CPU usage and system performance. By running processes concurrently, we provide multiple services and serve more clients at the same time, which is the main task of a modern OS.
In the Linux Intel i386 architecture, multiprocess is already supported. By choosing proper process scheduling algorithms, it has lower average response time and relatively high system performance. But unfortunately, there is a limitation in the Linux kernel 2.2.x that limits the number of running processes to 4090. This number may be enough for a desktop system but is inadequate for an enterprise server.
Consider the basic principle of a typical web server, which is based on multiprocess/multithread technology. When a client request comes, the web server creates a child process or thread to handle the request. So it is easy for a heavy load server to have thousands of processes running. In fact, most of such enterprise servers run operating systems like Solaris, AIX, HP-UX, etc., rather than Linux.
Many Linux developers have noticed this problem and have tried to solve it. In experimental version 2.3.x and prerelease 2.4, this limitation has been dealt with. But, it will still be a while before the official release of 2.4, and it may take even longer for it to be stable. Does this mean we have to choose another OS? Is it possible to find a solution that can break through that limitation for Linux 2.2.x? In order to answer this question, first we have to know how process management in 2.2.x works.
Process management is tightly bound with memory management. Since the implementation of memory management is based on hardware architecture, we have to have a look at the i386 architecture first. In modern operating systems, virtual memory technology is widely employed. Thanks to virtual memory technology, software can use more memory than is physically present. That is to say, the memory addresses used by software are virtual and are converted to real address by processor-provided mechanisms during access.
There are two basic memory management methods: segmentation and paging. Segmentation means dividing memory into several segments and accessing memory by both segment pointer and offset. This method is used in early systems like PDP-11, etc. Paging means dividing memory into several fixed-size pages and using pages as the basic memory management unit. When accessing memory, an address is converted to a physical address according to the page table.
Memory management in the i386 architecture is called segmentation with paging. The virtual address space is divided into segments first by using two tables: the Global Descriptor Table (GDT) and Local Descriptor Table (LDT). After this, the virtual address is converted to a linear address. Then the linear address is converted to a physical address using two-level page tables: the Page Directory Table and Page Table. Figure 1 shows how the virtual address has been converted to a real address.
In Linux, the kernel runs in ring 0. By setting GDT, the kernel puts its code and data into a separate address space. All other programs run in ring 3 with their data and code in the same address space. Creating different page tables protects those user programs. The GDT table in Linux 2.2.x is shown below in Figure 2. In practice, a user program can use other code/data segments by setting LDT.
A process is a running program with all resources allocated. It is a dynamic concept. In the i386 architecture, “task” is an alternative name for process. For convenience, here we will use process only. Process management is a concept concerned with system initialization, process creation and destruction, scheduling, interprocess communication, etc. In Linux, process is actually a group of data structures including the context of process, scheduling data, semaphores, process queue, process id, time, signals, etc. This group of data is called Process Control Block or PCB. In implementation, PCB is in the bottom of the process stack.
Process management in Linux relies greatly on the hardware architecture. We have just discussed the basis of page-with-segment memory management in i386, but in fact, segment plays a more important role than just a block of memory. For example, Task Status Segment is one of the most important segments in i386. It contains much data that is required by the system. Each process must have a TSS pointed by TR register. According to the definition of i386, the selector in TR must select a descriptor in GDT. Additionally, the selector in LDTR, which defines a process LDT, must have a corresponding entry in GDT as well.
In order to satisfy the above requirements, Linux 2.2.x GDT is allocated for all possible processes. The maximum concurrent process number is defined when booting the kernel. The kernel reserves 2 GDT entries for each process.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Control Your Linux Desktop with D-Bus
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Google's SwiftShader Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide