The Devil's in the Details

This article, the third of five on writing character device drivers, introduces concepts of reading, writing, and using ioctl-calls.
Handling select()

The last important I/O function to be shown is select(), one of the most interesting parts of Unix, in our opinion.

The select() call is used to wait for a device to become ready, and is one of the most scary functions for the novice C programmer. While its use from within an application is not shown here, the driver-specific part of the system call is shown, and its most impressive feature is its compactness.

Here's the full code:

Static int skel_select(struct inode *inode,
                       struct file *file,
                       int sel_type,
                       select_table *wait) {
    Skel_Clientdata *data=filp->private_data;
    Skel_Board *board=data->board;
    if (sel_type==SEL_IN) {
        if (! SKEL_IBUF_EMPTY (board))
            /* readable */
            return 1;
        select_wait(&(hwp->skel_wait_iq), wait);
        /* not readable */
        return 0;
    if (sel_type==SEL_OUT) {
        if (! SKEL_OBUF_FULL (board))
            return 1;  /* writable */
        /* hw knows */
        select_wait (&(hwp->skel_wait_oq), wait);
        return 0;
    /* exception condition: cannot happen */
    return 0;

As you can see, the kernel takes care of the hassle of managing wait queues, and you have only to check for readiness.

When we first wrote a select() call for a driver, we didn't understand the wait_queue implementation, and you don't need to either. You only have to know that the code works. wait_queues are challenging, and usually when you write a driver you have no time to accept the challenge.

Actually, select is better understood in its relationships with read and write: if select() says that the file is readable, the next read must not block (independently of O_NONBLOCK), and this means you have to tell the hardware to return data. The interrupt will collect data, and awaken the queue. If the user is selecting for writing, the situation is similar: the driver must tell if write() will block or not. If the buffer is full it will block, but you don't need to tell the hardware about it, since write() has already told it (when it filled the buffer). If the buffer is not full, the write won't block, so you return 1.

This way to think of selecting for write may appear strange, as there are times when you need to write synchronously, and you may expect that a device is writable when it has already accepted pending input. Unfortunately, this way of doing things will break the blocking/nonblocking machinery, and thus an extra call is provided: if you need to write synchronously, the driver must offer (within its fops) the fsync()call. The application invokes fops->fsync through the fsync() system call, and if the driver doesn't support it, -EINVAL is returned.

ioctl()--Passing Control Information

Imagine that you want to change the baud-rate of a serial multiport card you have built. Or tell your frame grabber to change the resolution of an image. Or whatever else... You could wrap these instructions into a series of escape sequences, such as, for example, the screen positioning in ANSI emulation. But, the normal method for this is to make an ioctl() call.

ioctl() calls as defined in <sys/ioctl.h> have the form

ioctl (int file_handle, int command, ...)

where ... is considered to be one argument of the type char * (according to the ioctl man page). Strange as it may be, the kernel receives these arguments in fs/ioctl.c in the form:

int sys_ioctl (unsigned int fd, unsigned int cmd,
               unsigned long arg);

To add to the confusion, <linux/ioctl.h> gives detailed rules how the commands in the second parameter should be built, but nobody in all the drivers is actually following these ideas yet.

In any case, rather than cleaning up the Linux source tree, let's concentrate on the general idea of ioctl() calls. As the user, you pass the file handle and a command in the first two arguments and pass as the third parameter a pointer to a data structure the driver should read and/or write.

A few commands are interpreted by the kernel itself—for example, FIONBIO, which changes the blocking/non-blocking flag of the file. The rest is passed to our own, driver-specific ioctl() call, and arrives in the form:

int skel_ioctl (struct inode *inode,
                struct file *file,
                unsigned int cmd,
                unsigned long arg)

Before we show a small example of a skel_ioctl() implementation, the commands you define should obey the following rules:

  1. Pick up a free MAGIC number from /usr/src/linux/MAGIC and make this number the upper eight bits of the 16-bit command word.

  2. Enumerate commands in the lower eight bits.

Why this? Imagine “Silly Billy” starts his favorite terminal program minicom to connect to his mailbox. “Silly Billy” accidentally changed the serial line minicom uses from /dev/ttyS0 to /dev/skel0 (he is quite silly). The next thing minicom does is initialize the “serial line” with an ioctl() using TCGETA as command. Unfortunately, your device driver, hidden behind /dev/skel0, uses that number to control the voltage for a long-term experiment in the lab...

If the upper eight bits in the commands for ioctl() differ from driver to driver, every ioctl() to an inappropriate device will result in an -EINVAL return, protecting us from extremely unexpected results.

Now, to finish this section, we will implement an ioctl() call reading or changing the timeout delay in our driver. If you want to use it, you have to introduce a new variable

unsigned long skel_timeout = SKEL_TIMEOUT;

right after the definition of SKEL_TIMEOUT and replace every later occurrence of SKEL_TIMEOUT with skel_timeout.

We choose the MAGIC '4' (the ASCII character 4) and define two commands:

# define SKEL_GET_TIMEOUT 0x3401
# define SKEL_SET_TIMEOUT 0x3402

In our user process, these lines will double the time-out value:

/* ... */
unsigned long timeout;
if (ioctl (skel_hd, SKEL_GET_TIMEOUT,
           &timeout) < 0) {
    /* an error occurred (Silly billy?) */
    /* ... */
timeout *= 2;
if (ioctl (skel_hd, SKEL_SET_TIMEOUT,
           &timeout) < 0) {
    /* another error */
    /* ... */

And in our driver, these lines will do the work:

int skel_ioctl (struct inode *inode,
                struct file *file,
                unsigned int cmd,
                unsigned long arg) {
    switch (cmd) {
        put_user_long(skel_timeout, (long*) arg);
        return 0;
        skel_timeout = get_user_long((long*) arg);
        return 0;
        return -EINVAL; /* for Silly Billy */

Georg V. Zezschwitz a 27-year-old Linuxer with a taste for the practical side of Computer Science and a tendency to avoid sleep.

XXXXXXXXXXXXXXXX (XXXXXXXXXXXXXX) Like Georg, is also 27-years-old and has the same interest in the practical side of Computer Science and the same tendency to avoid sleep.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Kernel Korner: The Devil's in the Details

Anonymous's picture


When you call wake_up_interruptible () from the handler the control unpends the sleeping task finish the read system call from the and then return back to continue the handler after calling wake_up ? Or else both the wake_up and unpending sleep task goes in parallel.

One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix