Booting the Kernel
A computer system is a complex machine, and the operating system is an elaborate tool that orchestrates hardware complexities to show a simple and standardized environment to the end user. When the power is turned on, however, the system software must boot the kernel and work in a limited operating environment. I describe here the booting process of three platforms: the old-fashioned PC and the more fully featured Alpha and SPARC platforms. The PC is covered in more detail, since it is still in more widespread use than other platforms, and also because it's the most tricky platform to bring up. No code will be shown, as assembly language is unintelligible to most readers, and each platform has its own.
In order to be able to use the computer when the power is turned on, the processor begins execution from the system's firmware. The firmware is “unmovable software” found in ROM; some manufacturers call it the Basic Input-Output System (BIOS) to underline its software role, some call it PROM or “flash” to stress its hardware implementation, while others call it “console” to focus on user interaction.
The firmware usually checks the hardware's functionality, retrieves part (or all) of the kernel from a storage medium and executes it. This first part of the kernel must load the rest of itself and initialize the whole system. I don't deal with firmware issues here with the kernel code, which is distributed with Linux.
When the x86 processor is turned on, it is a 16-bit processor that sees only 1MB of RAM. This environment is known as “real mode” and is dictated by compatibility with older processors of the same family. Everything that makes up a complete system must live within the available megabyte of address space, i.e., the firmware, video buffers, space for expansion boards and a little RAM (the infamous 640KB) must all be there.
To make things difficult, the PC firmware loads only half a kilobyte of code and establishes its own memory layout before loading this first sector. Whatever the boot media, the first sector of the boot partition is loaded into memory at the address 0x7c00, where execution begins. What happens at 0x7c00 depends on the boot loader being used; we examine three situations here: no boot-loader, LILO, Loadlin.
Even though it's rare to boot the system without a boot loader, it is still possible to do so by copying the raw kernel to a floppy disk. The command cat zImage > /dev/fd0 works perfectly on Linux, although some other Unix systems can do the task reliably only by using the dd command. Without going into detail, the raw floppy image created by zImage can then be configured using the rdev program.
The file called zImage is the compressed kernel image that resides in arch/i386/boot after either make zImage or make boot is executed—the latter invocation is the one I prefer, as it works unchanged on other platforms. If you built a “big zImage” instead, the kernel file created is called bzImage and resides in the same directory.
Booting an x86 kernel is a tricky task because of the limited amount of available memory. The Linux kernel tries to maximize usage of the low 640 kilobytes by moving itself around several times. Let's look at the steps performed by a zImage kernel in detail; all of the following path names are relative to the arch/i386/boot directory.
The first sector (executing at 0x7c00) moves itself to 0x90000 and loads subsequent sectors after itself, getting them from the boot device using the firmware's functions to access the disk. The rest of the kernel is then loaded to address 0x10000, allowing for a maximum size of half a megabyte of data—remember, this is the compressed image. The boot sector code lives in bootsect.S, a real-mode assembly file.
Then code at 0x90200 (defined in setup.S) takes care of some hardware initialization and allows the default text mode (video.S) to be changed. Text mode selection is a compile-time option from 2.1.9 onwards.
Later, all the kernel is moved from 0x10000 (64K) to 0x1000 (4K). This move overwrites BIOS data stored in RAM, so BIOS calls can no longer be performed. The first physical page is not touched because it is the so-called “zero-page”, used in handling virtual memory.
At this point, setup.S enters protected mode and jumps to 0x1000, where the kernel lives. All the available memory can be accessed now, and the system can begin to run.
The steps just described were once the whole story of booting when the kernel was small enough to fit in half a megabyte of memory—the address range between 0x10000 and 0x90000. As features were added to the system, the kernel became larger than half a megabyte and could no longer be moved to 0x1000. Thus, code at 0x1000 is no longer the Linux kernel, instead the “gunzip” part of the gzip program resides at that address. The following additional steps are now needed to uncompress the kernel and execute it:
head.S in the compressed directory is at 0x1000, and is in charge of “gunzipping” the kernel; it calls the function decompress_kernel, defined in compressed/misc.c, which in turns calls inflate which writes its output starting at address 0x100000 (1MB). High memory can now be accessed, because the processor is definitely out of its limited boot environment—the “real” mode.
After decompression, head.S jumps to the actual beginning of the kernel. The relevant code is in ../kernel/head.S, outside of the boot directory.
The boot process is now over, and head.S (i.e., the code found at 0x100000 that used to be at 0x1000 before introducing compressed boots) can complete processor initialization and call start_kernel(). Code for all functions after this step is written in C.
The various data movements performed at system boot are depicted in Figure 1.
The boot steps shown above rely on the assumption that the compressed kernel can fit in half a megabyte of space. While this is true most of the time, a system stuffed with device drivers might not fit into this space. For example, kernels used in installation disks can easily outgrow the available space. Some new method is needed to solve the problem—this new method is called bzImage and was introduced in kernel version 1.3.73.
A bzImage is generated by issuing make bzImage from the top level Linux source directory. This kind of kernel image boots similarly to zImage, with a few changes:
When the system is loaded to address 0x10000, a little helper routine is called after loading each 64K data block. The helper routine moves the data block to high memory by using a special BIOS call. Only the newer BIOS versions implement this functionality, and so, make boot still builds the conventional zImage, though this may change in the near future.
setup.S doesn't move the system back to 0x1000 (4K) but, after entering protected mode, jumps instead directly to address 0x100000 (1MB) where data has been moved by the BIOS in the previous step.
The decompresser found at 1MB writes the uncompressed kernel image into low memory until it is exhausted, and then into high memory after the compressed image. The two pieces are then reassembled to the address 0x100000 (1MB). Several memory moves are needed to perform the task correctly.
The rule for building the big compressed image can be read from Makefile; it affects several files in arch/i386/boot. One good point of bzImage is that when kernel/head.S is called, it doesn't notice the extra work, and everything goes forward as usual.
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- RSS Feeds
- Introduction to MapReduce with Hadoop on Linux
- Weechat, Irssi's Little Brother
- New Products
- Tech Tip: Really Simple HTTP Server with Python
- Poul-Henning Kamp: welcome to
1 hour 48 min ago
- This has already been done
1 hour 49 min ago
- Reply to comment | Linux Journal
2 hours 35 min ago
- Welcome to 1998
3 hours 23 min ago
- notifier shortcomings
3 hours 47 min ago
5 hours 24 min ago
- Android User
5 hours 25 min ago
- Reply to comment | Linux Journal
7 hours 18 min ago
10 hours 8 min ago
- This is a good post. This
15 hours 21 min ago