Device Drivers Concluded
Though only a few drivers implement the memory mapping technique, it gives an interesting insight into the Linux system. I introduce memory management and its features, enabling us to play with the console, include memory mapping in drivers, and crash systems...
Since the days of the 80386, the Intel world has supported a technique called virtual addressing. Coming from the Z80 and 68000 world, my first thought about this was: “You can allocate more memory than you have as physical RAM, as some addresses will be associated with portions of your hard disk”.
To be more academic: Every address used by the program to access memory (no matter whether data or program code) will be translated--either into a physical address in the physical RAM or an exception, which is dealt with by the OS in order to give you the memory you required. Sometimes, however, the access to that location in virtual memory reveals that the program is out of order—in this case, the OS should cause a “real” exception (usually SIGSEGV, signal 11).
The smallest unit of address translation is the page, which is 4 kB on Intel architectures and 8 kB on Alpha (defined in asm/page.h).
When trying to understand the process of address resolution, you will enter a whole zoo of page table descriptors, segment descriptors, page table flags and different address spaces. For now, let's just say the linear address is what the program uses; using a segment descriptor, it is turned into a logical address, which is then resolved to a physical address (or a fault) using the paging mechanism. The Linux Kernel Hacker's Guide spends 20 pages on a rather short description of all these beasties, and I see no chance of making a more succinct explanation.
For any understanding of the building, administration, and scope of pages when using Linux, and how the underlying technique—especially of the Intel family—works, you have to read the Linux Kernel Hacker's Guide. It is freely available by ftp from tsx-11.mit.edu in the /pub/linux/docs/LDP/ directory. Though the book is slightly old [that's a gentle understatement—ED], nothing has changed in the internals of the i386, and other processors looks similar (in particular, the Pentium is exactly like a 386).
If you want to learn about page management, either you start reading the nice guide now, or you believe this short and abstract overview:
Every process has a virtual address space implemented by several CPU registers which are changed during context switches (this is the zoo of selectors and page description pointers). By these registers, the CPU accesses all the memory segments it needs.
Multiple levels of translation tables are used to translate the linear address given by the process to the physical address in RAM. The translation tables all reside in memory. They are automatically looked up by the CPU hardware, but they are built and updated by the OS. They are called page descriptor tables. In these tables there is one entry (i.e., a “page descriptor”) for every page in the process's address space—we're talking of the logical addresses, also called virtual addresses.
We concentrate now on a few main aspects of pages as seen by the CPU:
A page may be “present” or not—depending on whether it is present in physical memory or not (if it has been swapped-out, or it is a page which has not yet been loaded). A flag in the page descriptor is used to indicate this status. Access to a non-present page is called a “major” page fault. The fault is handled in Linux by the function do_no_page(), in mm/memory.c. Linux counts page faults for each process in the field maj_flt in struct task_struct.
A page may be write-protected—any attempt to write on the page will cause a fault (called “minor page fault”, handled in do_wp_page() and counted in the min_flt field of struct task_struct).
A page belongs to the address space of one task or several of them; each such task holds a descriptor for the page. “Task” is what microprocessor technicians call a process.
Other important features of pages, as seen by the OS, are:
If multiple processes use the same page of physical memory, they are said to “share” it. The processes hold separate page descriptors for shared page, and the entries can differ—for example, one process can have write permission on the page and another process may not.
A page may be marked as copy on write (grep for COW in kernel sources). If, for example, a process forks, the child will share the data segments with the parent, but both will be write-protected: the pages are shared for reading. As soon as one program writes onto a page, the page is doubled and the writing program gets a new page; the other keeps its old one, with a decremented “share count”. If the share count is already one, no copy is performed when a minor fault happens, and the page is just marked as writable. Copy-on-write minimizes memory usage.
A page may be locked against being swapped out. All kernel modules and the kernel itself reside in locked pages. As you might remember from the last installment, pages which are used for DMA-transfers have to be protected against being swapped out.
Page descriptors may also point to addresses not located in physical RAM, but rather the ROM of certain peripherals, RAM buffers for video cards etc., or PCI buffers. Traditionally, on Intel architectures, the range for the first two groups is from 640 kB to 1024 kB, and the range for the PCI buffers is above high_memory (the top of physical RAM, defined in asm/pgtable.h). The range from 640 KB to 1024 kB not used by Linux, and is tagged as reserved in the mem_map structure. They are the “384k reserved”, appearing in the first kernel message after BogoMips calculation.
Virtual memory allows quite beautiful things like:
Demand-loading a program instead of loading it totally into memory at startup: whenever you start a program, it gets its own virtual address space, which is just associated with some blocks on your filesystem and some space for variables, but the memory is allocated and loading is performed only when you really access the different portions of the program.
Swapping, in case your memory gets tight. This means whenever Linux needs memory for itself or a program and unused memory gets tight, it will try to shrink the buffers for the file systems, try to “forget” pages already allocated for program code that is executed (they can be reloaded from disk at any time anyway), or swap some pages containing user data to the swap partition of the hard disk.
Memory Protection. Each process has its own address space and can't look at memory belonging to other processes.
Memory Mapping: Just declare a portion or the whole of a file you have opened as a part of your memory, by means of a simple function call.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- Happy GPL Birthday VLC!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Handheld Emulation: Achievement Unlocked!
- Unikernels, Docker, and Why You Should Care
- Giving Silos Their Due
- Controversy at the Linux Foundation
- Don't Burn Your Android Yet
- New Products
- Non-Linux FOSS: Snk
- Firefox OS