Dynamic Kernels: Discovery

SysAdmin

by Alessandro Rubini

on April 1, 1996

The last issues Kernel Korner introduced problems related to loading and unloading a custom module, but didn't uncover the code to actually perform these tasks. This time, we are going to look at some finer details of module-writing, in order to begin showing the actual code for our character device driver.

Load-time Configuration

Although a smart driver should be able to autodetect the hardware it looks for, autodetection is not always the most sensible implementation, because it may be tricky to design. It is wise to provide a way to specify as many details as possible at load time, in order to test your driver before scratching your head and deciding to implement autodetection. Moreover, autodetection may fail if “similar” hardware is installed in the computer. A minor project can simply avoid autodetection altogether.

To configure the driver at load time, we'll exploit insmod's capability to assign integer values to arbitrary variables in the driver. We'll thus declare a (public) integer variable for each configurable item, and we'll make sure that the default value will trigger autodetection.

Configuring multiple boards at load time is left as an exercise to the reader (after reading the manpage for insmod); this implementation allows specification of a single board: for the sake of simplicity additional boards are only reachable through autodetection.

Choosing Names

The kernel is a complex application, and it is vital to keep its namespace as tidy as possible. This means both to use private symbols wherever it is possible, and to use a common prefix for all the symbols you define.

A production environment will only declare init_module() and cleanup_module(), which are used to load and unload the driver, and any load-time configuration variables. Nothing else needs to be public, because the module is accessed through pointers, not by name.

However, when you are developing and testing your code, you'll need your functions and data structures in the public symbol table in order to access them with your favorite debugging tool.

The easiest way to accomplish this dual need is to always use your own prefix in names, declare all of your symbols Static (note the capital `S'), and include the following five lines at the top of your driver:

#ifdef DEBUG_modulename
# define Static /* nothing */
#else
# define Static static
#endif

Real static symbols (such as persistent local variables) may thus be declared static, while debuggable symbols are declared Static

The init_module() Function

In this page, the whole code for the initialization function is uncovered. This is skeletal code, as the skel name suggests: a real-world device usually has slightly more than two I/O ports.

The most important thing to remember here is to release all the resources you already asked for whenever you find an error condition. This kind of task is well handled by the (otherwise unloved) goto statement: code duplication is avoided by jumping to the resource-release part of the function in case of error.

The fragment of code shown accepts load-time configuration for the major number, for the base address of the board's I/O ports, and for the IRQ number. For each “possible” board (in the I/O space), the autodetection function is called. If no boards are detected, init_module() returns -ENODEV to tell insmod that no devices are there.

Sometimes it is wise to allow the driver to be loaded even if its hardware is not installed in the computer. I implement such code in order to develop most of my driver at home. The trick is to have a configuration variable (skel_fake) which allows you to fake a nonexistent board. You can look at the implementation in my own drivers. “Faking boards” is a powerful way to start writing code before you get the hardware, or to test support for two boards even if you only own one of them.

The role of cleanup_module() is to shut down the device and release any resources allocated by init_module(). Our sample code cycles through the array of boards and releases I/O ports and the IRQ, if any. Finally, the major number is released. The initial check for MOD_IN_USE is redundant if you're running a recent kernel, but a wise thing to put in production code, because your customers or users may be running old Linux kernels.

The sample code for init_module() and cleanup_module() is shown in Listing 1. The prefix skel_ is used for all non-local names. The code here is quite simplified, in that it lacks some error-checking, which is vital in production-quality source code.

Autodetecting the Device

init_module() calls the function skel_find() to perform the dirty task of detecting if a board is there. The function is very device specific, because each device must be probed for its peculiar features; thus, I won't try to show code to perform the actual probing, but only IRQ autodetection.

Unfortunately, some peripherals can't tell which IRQ line they're configured to use, thus forcing the user to write the IRQ number on the command line of insmod, or to hardcode the number in the software itself. Both these approaches are bad practice, because you just can't plug the board (after setting the jumpers) and load the driver. The only way to autodetect the IRQ line for these devices is a trial-and-error technique, which is, of course, only viable if the hardware can be instructed to generate interrupts.

The code in Listing 2 shows skel_find(), complete with IRQ autodetection. Some details of IRQ handling may appear obscure to some readers, but they will be clarified in the next article. To summarize, this code cycles through each of the possible IRQ lines, asking to install a handler, and looks to see if interrupts are actually generated by the board.

The field hwirq in the hardware structure represents the useable interrupt line, while the field irq is only valid when the line is active (after request_irq()). As explained in the last issue, it makes no sense to keep hold of an IRQ line when the device is not in use; that's why two fields are used.

Please note that I wrote this code as a work-around for the limitations of one of my hardware boards; if your hardware is able to report the IRQ line it's going to use, it's much better to use that information instead. The code is quite stable, anyway, if you are able to tailor it to your actual hardware. Fortunately, most good hardware is able to report its own configuration.

fops and filp

After the module has been loaded and the hardware has been detected, we must see how the device is acted upon. This means introducing the role of fops and filp: these little beasts are the most important data structures—actually, variable names—used in interfacing the device driver with the kernel.

fops is the name usually devoted to a struct file_operations. The structure is a jump table (structure of pointers to functions), and each field refers to one of the different operations performed on a filesystem node (open(), read(), ioctl(), etc.).

A pointer to your own fops is passed to the kernel by means of register_chrdev(), so that your functions will be called whenever one of your nodes is acted upon. We already wrote that line of code, but didn't show the actual fops. Here it is:

struct file_operations skel_fops {
  skel_lseek,
  skel_read,
  skel_write,
  NULL,       /* skel_readdir */
  skel_select,
  skel_ioctl,
  skel_mmap,
  skel_open,
  skel_close
};

Each NULL entry in your fops means that you're not going to offer that functionality for your device (select is special, in this respect, but I won't expand on it), each non-NULL entry must be a pointer to a function implementing the operation for your device. Actually, there exist a few more fields in the structure, but our example will live with the default NULL value (the C compiler fills up an incomplete structure with zero bytes without issuing any warning). If you are really interested in them, you can look at the structure's definition in <linux/fs.h>. filp is the name usually devoted to one of the arguments passed by the kernel to any function in your fops, namely a struct file *. The file structure is used to keep all the available status information about an “open file”, beginning with a call to open() and up to a call to close(). If the device is opened multiple times, different filps will be used for each instance: this means that you'll need to use your own data structure to keep hardware information about your devices. The code fragments within this installment already use an array of Skel_Hw, to hold information about several boards installed on the same computer. What is missing, then, is a way to embed hardware information in the file structure, in order to instruct the driver to operate on the right device. The field private_data exists in struct file just for that task, and is a pointer to void. You'll make private_data point to your hardware information structure when skel_open() gets invoked. If you need to keep some extra information private to each filp (for example, if two device nodes access the same hardware in two different ways), then you'll need a specific structure for private_data, which must be kmalloc()ed on open and kfree()ed on close. The implementations of open() and close() that we'll see later, work in this way.

Using Minor Numbers

In the last article I introduced the idea of minor device numbers, and it is now high time to expand on the topic.

If your driver manages multiple devices, or a single device but in different ways, you'll create several nodes in the /dev directory, each with a different minor number. When your open function gets invoked, then, you can examine the minor number of the node being opened, and take appropriate actions.

The prototypes of your open and close functions are

int skel_open (struct inode *inode,
               struct file *filp);
void skel_close (struct inode *inode,
                 struct file *filp);

and the minor number (an unsigned value, currently 8 bits) is available as MINOR(inode->i_rdev). The MINOR macro and the relevant structures are defined within <linux/fs.h>, which in turn is included in <linux/sched.h>.

Our skel code (Listing 3) will split the minor number in order to manage both multiple boards (using four bits of the minor), and multiple modes (using the remaining four bits). To keep things simple we'll only write code for two boards and two modes. The following macros are used:

#define SKEL_BOARD(dev) (MINOR(dev)&0x0F)
#define SKEL_MODE(dev)  ((MINOR(dev)>>4)&0x0F)

The nodes will be created with the following commands (within the skel_load script, see last month's article):

mknod skel0    c $major  0
mknod skel0raw c $major  1
mknod skel1    c $major 16
mknod skel1raw c $major 17

But let's turn back to the code. This skel_open() sorts out the minor number and folds any relevant information inside the filp, in order to avoid further overhead when read() or write() will be invoked. This goal is achieved by using a Skel_Clientdata structure embedding any filp-specific information, and by changing the pointer to your fops within the filp; namely, filp->f_op.

Changing values within filp may appear a bad practice, and it often is; it is, however, a smart idea when the file operations are concerned. The f_op field points to a static object anyways, so you can modify it lightheartedly, as long as it points to a valid structure; any subsequent operation on the file will be dispatched using the new jump table, thus avoiding a lot of conditionals. This technique is used within the kernel proper to implement the different memory-oriented devices using a single major device number.

The complete skeletal code for open() and close() is shown in Listing 3; the flags field in the clientdata will be used when ioctl() is introduced.

Note that the close() function shown here should be referred to by both fopss. If different close() implementations are needed, this code must be duplicated.

Multiple- or Single-open?

A device driver should be a policy-free program, because policy choices are best suited to the application. Actually, the habit of separating policy and mechanism is one of the strong points of Unix. Unfortunately, the implementation of skel_open() leads itself to policy issues: is it correct to allow multiple concurrent opens? If yes, how can I handle concurrent access in the driver?

Both single-open and multiple-open have sound advantages. The code shown for skel_open() implements a third solution, somewhat in-between.

If you choose to implement a single-open device, you'll greatly simplify your code. There's no need for dynamic structures because a static one will suffice; thus, there's no risk to have memory leakage because of your driver. In addition, you can simplify your select() and data-gathering implementation because you're always sure that a single process is collecting your data. A single-open device uses a boolean variable to know if it is busy, and returns -EBUSY when open is called the second time. You can see this simplified code in the busmouse drivers and lp driver within the kernel proper.

A multiple-open device, on the other hand, is slightly more difficult to implement, but much more powerful to use for the application writer. For example, debugging your applications is simplified by the possibility of keeping a monitor constantly running on the device, without the need to fold it in the application proper. Similarly, you can modify the behaviour of your device while the application is running, and use several simple scripts as your development tools, instead of a complex catch-all program. Since distributed computation is common nowadays, if you allow your device to be opened several times, you are ready to support a cluster of cooperating processes using your device as an input or output peripheral.

The disadvantages of using a conventional multiple-open implementation are mainly in the increased complexity of the code. In addition to the need for dynamic structures (like the private_data already shown), you'll face the tricky points of a true stream-like implementation, together with buffer management and blocking and non-blocking read and write; but those topics will be delayed until next month's column.

At the user level, a disadvantage of multiple-open is the possibility of interference between two non-cooperating processes: this is similar to cat-ing a tty from another tty—input may be delivered to the shell or to cat, and you can't tell in advance. [For a demonstration of this, try this: start two xterms or log into two virtual consoles. On one (A), run the tty command, which tells you which tty is in use. On the other (B), type cat /dev/tty_of_A. Now go to A and type normally. Depending on several things, including which shell you use, it may work fine. However, if you run vi, you will probably see what you type coming out on B, and you will have to type ^C on B to be able to recover your session on A—ED]

A multiple-open device can be accessed by several different users, but often you won't want to allow different users to access the device concurrently. A solution to this problem is to look at the uid of the first process opening the device, and allow further opens only to the same user or to root. This is not implemented in the skel code, but it's as simple as checking current->euid, and returning -EBUSY in case of mismatch. As you see, this policy is similar to the one used for ttys: login changes the owner of ttys to the user that has just logged in.

The skel implementation shown here is a multiple-open one, with a minor addition: it assures that the device is “brand new” when it is first opened, and it shuts the device down when it is last closed.

This implementation is particularly useful for those devices which are accessed quite rarely: if the frame grabber is used once a day, I don't want to inherit strange setting from the last time it was used. Similarly, I don't want to wear it out by continuously grabbing frames that nobody is going to use. On the other hand, startup and shutdown are lengthy tasks, especially if the IRQ has to be detected, so you might not choose this policy for your own driver. The field usecount within the hardware structure is used to turn on the device at the first open, and to turn it off on the last close. The same policy is devoted to the IRQ line: when the device is not being used, the interrupt is available to other devices (if they share this friendly behaviour).

The disadvantages of this implementation are the overhead of the power cycles on the device (which may be lengthy) and the inability to configure the device with one program in order to use it with another program. If you need a persistent state in the device, or want to avoid the power cycles, you can simply keep the device open by means of a command as silly as this:

sleep 1000000 < /dev/skel0 &

As it should be clear from the above discussion, each possible implementation of the open() and close() semantics has its own peculiarities, and the choice of the optimum one depends on your particular device and the main use it is devoted to. Development time may be considered as well, unless the project is a major one. The skel implementation here may not be the best for your driver: it is only meant as a sample case, one amongst several different possibilities. Additional Information

Alessandro Rubini (rubini@ipvvis.unipv.it) Programmer by chance and Linuxer by choice, Alessandro is taking his PhD course in computer science and is breeding two small Linux boxes at home. Wild by his very nature, he loves trekking, canoeing, and riding his bike.

Load Disqus comments