Device Drivers Concluded

This is the last of five articles about character device drivers. In this final section, Georg deals with memory mapping devices, beginning with an overall descriptoin of Linux memory management concepts.
Your Turn

When your device driver gets the call from do_mmap(), a VMA has already been created for the new mapping, but not yet inserted into the task's memory structure.

The device driver function should comply to this prototype:

int skel_mmap (struct inode *inode,
               struct file *file,
               struct vm_area_struct *vma)

vma->vm_start will contain the address in user space to be mapped to. vma->vm_end contains its end, the difference between these two elements represents the length argument in the original users call to mmap(). vma->vm_offset is the offset on the mmaped file, identical to the offset argument passed to the system call.

Let's explore how the /dev/mem driver performs the mapping. You find the code lines in drivers/char/mem.c in the function mmap_mem(). If you look for something complicated, you will be disappointed: it calls only remap_page_range(). If you want to understand what happens here, you really should read the 20 pages from the Kernel Hacker's Guide. In short, the page descriptors for the given process address space are generated and filled with links to the physical memory. Note, the VMA structure is for Linux memory management, whereas the page descriptors are directly interpreted by the CPU for address resolution.

If remap_page_range() returns zero, saying that no error has occurred, your mmap-handler should do the same. In this case, do_mmap() will return the address to which the pages were mapped . Any other return value is treated as an error.

A Concrete Driver

It will be difficult to give code lines for all possible applications of the mmap technique in the different character drivers. Our concrete example is of a laboratory device with its own RAM, CPU and, of course, analog to digital converters, digital to analog converters, digital inputs and outputs, and clocks (and bells and whistles).

The lab device we dealt with is able to sample steadily into its memory and report the status of its work when asked via the character channel, which is an ASCII stream-like channel. The command-based interaction is done via the character device driver we implemented and its read and write calls.

The actual mass transfer of data is done independently from that: by sending a command like TOHOST interface address, length, host address, the lab device will raise an interrupt and tell the PC it wants to send a certain amount of data to a given address at the host by DMA. But where should we put that data? We decided not to mix up the clear character communication with the mass data transfer. Moreover, as the user could even upload its own commands to the device, we could make no assumptions about the ordering and the meaning of the data.

So we decided to hand full control to the user and allow him to request DMA-able memory portions mapped to the user address space, and check every DMA request coming from the lab device against the list of those areas. In other words, we implemented something like a skel_malloc and skel_free by means of ioctl() commands and disallowed any transfer to any other region in order to keep the whole thing safe.

You might wonder why we did not use the direct mmap(). Mostly, because there is no equivalent munmap. Your driver will not be notified when the mapping to the open file is destroyed. Linux does it all by itself: it removes the vma structure, destroys the page descriptor tables and decreases the reference count for the shared pages.

As we have to allocate the DMA-able buffer by kmalloc(), we have to free it by kfree(). Linux won't allows us to do so when automatically unmapping the user reference, but without the user reference, we don't need the buffer any more. Therefore, we implemented a skel_malloc() which actually allocates the driver buffer and remaps it to the user space as well, and skel_free() which release that space and unmaps it (after checking if a DMA-transfer is running).

We could implement the remapping in the user library released with our device driver by the same means used by the nasty program above. But, for good reason, /dev/mem can be read and written only by root, and programs accessing the device driver should be able to run as normal user, too.

Two tricks are used in our driver. First, we modify the mem_map array telling Linux about the usage and permissions of our pages of physical memory. mem_map is an array of mem_map_t structures, and is used to keep information about all the physical memory.

For all allocated pages we set the reserved flag. This is a quick and dirty method, but it reaches its aim under all Linux versions (starting at least at 1.2.x): Linux keeps its hands off our pages! It considers them as if they were a video buffer, a ROM, or anything else it can't swap or release into free memory. The mem_map array itself uses this trick to protect itself from processes hungry for memory.

The second trick we use is quickly generating a pseudo file which looks something like an opened /dev/mem. We rebuild the mmap_mem() call from the /dev/mem driver, especially because it is not exported in the kernel symbol table and simply apply the same small call to remap_page_range().

Moreover, DMA-buffers allocated by our skel_malloc() call are registered in lists in order to check whether a request for a DMA transfer is going to a valid memory area. The lists are also used to free the allocated buffers when the program closes the device without calling skel_free() beforehand. dma_desc is the type of those lists in the following lines, which show the code for the ioctl-wrapped skel_malloc() and skel_free():

/* =============================================
 * The user desires a(nother) dma-buffer, that
 * is allocated by kmalloc (GFP_DMA) (continuous
 * and in lower 16 MB).
 * The allocated buffer is mapped into
 * user-space by
 *  a) a pseudo-file as you would get it when
 *     opening /dev/mem
 *  b) the buffer-pages tagged as "reserved"
 *     in memmap
 *  c) calling the normal entry point for
 *     mmap-calls "do_mmap" with our pseudo-file
 * 0 or <0 means an error occurred, otherwise
 * the user space address is returned.
 * This is the main basis of the Skel_Malloc
 * Library-Call
 * ------------------------------
 * Ma's little helper replaces the mmap
 * file_operation for /dev/mem which is declared
 * static in Linux and has to be rebuilt by us.
 * But ain't that much work; we better drop more
 * comments before they exceed the code in length.
static int skel_mmap_mem (struct inode * inode,
                   struct file * file,
                   struct vm_area_struct *vma) {
    if (remap_page_range(vma->vm_start,
                   vma->vm_end - vma->vm_start,
        return -EAGAIN;
    vma->vm_inode = NULL;
    return 0;
static unsigned long skel_malloc (struct file *file,
                            unsigned long size) {
    unsigned long    pAdr, uAdr;
    dma_desc         *dpi;
    skel_file_info   *fip;
    struct file_operations  fops;
    struct file      memfile;
    /* Our helpful pseudo-file only ... */
    fops.mmap = skel_mmap_mem;
    /* ... support mmap */
    memfile.f_op = &fops;
    /* and is read'n write */
    memfile.f_mode = 0x3;
    fip = (skel_file_info*)(file->private_data);
    if (!fip) return 0;
    dpi = kmalloc (sizeof(dma_desc), GFP_KERNEL);
    if (!dpi) return 0;
    PDEBUG ("skel: Size requested: %ld\n", size);
    if (size <= PAGE_SIZE/2)
        size = PAGE_SIZE-0x10;
    if (size > 0x1FFF0) return 0;
    pAdr = (unsigned long) kmalloc (size,
                           GFP_DMA | GFP_BUFFER);
    if (!pAdr) {
        printk ("skel: Trying to get %ld bytes"
                "buffer failed - no mem\n", size);
        kfree_s (dpi, sizeof (dma_desc));
        return 0;
    for (uAdr = pAdr & PAGE_MASK;
         uAdr < pAdr+size;
         uAdr += PAGE_SIZE)
        /* before 1.3.29 */
        mem_map [MAP_NR (uAdr)].reserved |=
#elseif LINUX_VERSION_CODE < 0x01033A
        /* before 1.3.58 */
        mem_map [MAP_NR (uAdr)].reserved = 1;
        /* most recent versions */
        mem_map_reserve (MAP_NR (uAdr));
    uAdr = do_mmap (&memfile, 0,
            (size + ~PAGE_MASK) & PAGE_MASK,
            MAP_SHARED, pAdr & PAGE_MASK);
    if ((signed long) uAdr <= 0) {
        printk ("skel: A pity - "
                "do_mmap returned %lX\n", uAdr);
        kfree_s (dpi, sizeof (dma_desc));
        kfree_s ((void*)pAdr, size);
        return uAdr;
    PDEBUG ("skel: Mapped physical %lX to %lX\n",
            pAdr, uAdr);
    uAdr |= pAdr & ~PAGE_MASK;
    dpi->dma_adr = pAdr;
    dpi->user_adr = uAdr;
    dpi->dma_size= size;
    dpi->next = fip->dmabuf_info;
    fip->dmabuf_info = dpi;
    return uAdr;
/* =============================================
 * Releases memory previously allocated by
 * skel_malloc
static int skel_free (struct file *file,
                      unsigned long ptr) {
    dma_desc    *dpi, **dpil;
    skel_file_info  *fip;
    fip = (skel_file_info*)(file->private_data);
    if (!fip) return 0;
    dpil = &(fip-).>dmabuf_info);
    for (dpi = fip->dmabuf_info;
         dpi; dpi=dpi->next) {
        if (dpi->user_adr==ptr) break;
        dpil = &(dpi->next);
    if (!dpi) return -EINVAL;
    PDEBUG ("skel: Releasing %lX bytes at %lX\n",
            dpi->dma_size, dpi->dma_adr);
    do_munmap (ptr & PAGE_MASK,
        (dpi->dma_size+(~PAGE_MASK)) & PAGE_MASK);
    ptr = dpi->dma_adr;
    do {
        /* before 1.3.29 */
        mem_map [MAP_NR(ptr)] &= ~MAP_PAGE_RESERVED;
#elseif LINUX_VERSION_CODE < 0x01033A
        /* before 1.3.58 */
        mem_map [MAP_NR(ptr)].reserved = 0;
        mem_map_unreserve (MAP_NR  (ptr));
        ptr += PAGE_SIZE;
    } while (ptr < dpi->dma_adr+dpi->dma_size);
    *dpil = dpi->next;
    kfree_s ((void*)dpi->dma_adr, dpi->dma_size);
    kfree_s (dpi, sizeof (dma_desc));
    return 0;


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Kernel Korner: Device Drivers Concluded

Anonymous's picture

I want to mmap the high pci memory . The physical address

i can get using pci_resource_start function. Exactly how can i do this??


Re: Kernel Korner: Device Drivers Concluded

Anonymous's picture


If I want to do two different mmap in my driver.

How to differentiate these to mmap calls in the driver?


differentiating things

Anonymous's picture

To distinguish your two areas either:
a) register two char devices.
b) use distinct offsets to determine which part to map.

One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix