Device Drivers Concluded
The guy who has to care for this beautiful stuff is your poor device driver writer. While support for mmap() on files is done by the kernel (by each file system type, indeed), the mapping method for devices has to be directly supported by the drivers, by providing a suitable entry in the fops structure, which we first introduced in the March issue of LJ.
First, we have a look at one of the few “real” implementations for such a support, basing the discussion on the /dev/mem driver. Next, we go on with a particular implementation useful for frame grabbers, lab devices with DMA-support and probably other peripherals.
To begin with, whenever the user calls mmap(), the call will reach do_mmap(), defined in the mm/mmap.c file. do_mmap() does two important things:
It checks the permissions for reading and writing the file handle against what was requested to mmap(). Moreover, tests for crossing the 4GB limit on Intel machines and other knock out-criteria are performed.
If those are well, a struct vm_area_struct variable is generated for the new piece of virtual memory. Each task can own several of these structures, “virtual memory areas” (VMAs).
VMAs require some explanation: they represent the addresses, methods, permissions and flags of portions of the user address space. Your mmaped region will keep its own vm_area_struct entry in the task head. VMA structures are maintained by the kernel and ordered in balanced tree structures to achieve fast access.
The fields of VMAs are defined in linux/mm.h. The number and content might be explored by looking at /proc/pid/maps for any running process, where pid is the process ID of the requested process. Let's do so for our small nasty program, compiled with gcc-ELF. While the program runs, your /proc/pid/maps table will look somewhat like this (without the comments):
# /dev/sdb2: nasty css 08000000-08001000 rwxp 00000000 08:12 36890 # /dev/sdb2: nasty dss 08001000-08002000 rw-p 00000000 08:12 36890 # bss for nasty 08002000-08008000 rwxp 00000000 00:00 0 # /dev/sda2: /lib/ld-linux.so.1.7.3 css 40000000-40005000 r-xp 00000000 08:02 38908 # /dev/sda2: /lib/ld-linux.so.1.7.3 dss 40005000-40006000 rw-p 00004000 08:02 38908 # bss for ld-linux.so 40006000-40007000 rw-p 00000000 00:00 0 # /dev/sda2: /lib/libc.so.5.2.18 css 40009000-4007f000 rwxp 00000000 08:02 38778 # /dev/sda2: /lib/libc.so.5.2.18 dss 4007f000-40084000 rw-p 00075000 08:02 38778 # bss for libc.so 40084000-400b6000 rw-p 00000000 00:00 0 # /dev/sda2: /dev/mem (our mmap) 400b6000-400c6000 rw-s 000b8000 08:02 32767 # the user stack bfffe000-c0000000 rwxp fffff000 00:00 0
The first two fields on each line, separated by a dash, represent the address the data is mmaped to. The next field shows the permissions for those pages (r is for read, w is for write, p is for private, and s is for shared). The offset in the file mmaped from is given next, followed by the device and the inode number of the file. The device number represents a mounted (hard) disk (e.g., 03:01 is /dev/hda1, 08:01 is /dev/sda1). The easiest (and slow) way to figure out the file name for the given inode number is:
cd /mount/point find . -inum inode-number -print
If you try to understand the lines and their comments, please notice that Linux separates data into “code storage segments” or css, sometimes called “text” segments; “data storage segments” or dss, containing initialized data structures; and “block storage segments” or bss, areas for variables that are allocated at execution time and initialized to zero. As no initial values for the variables in the bss have to be loaded from disk, the bss items in the list show no file device (“0” as a major number is NODEV). This shows another usage of mmap: you can pass MAP_ANONYMOUS for the file handle to request portions of free memory for your program. (In fact, some versions of malloc get their memory this way.)