Block Device Drivers: Interrupts

Last month, we gave an introduction to block device drivers. This month, we look at some tricks that are useful when writing block device drivers, starting with the most basic “trick” of using hardware interrupts where available and describing some neat infrastructure that block device drivers can take advantage of by adding five lines of code and one function.
Automatic Timeouts

In blk.h, a mechanism for timing out when hardware doesn't respond is provided. If the foo device has not responded to a request after 5 seconds have passed, there is very clearly something wrong. We will update blk.h again:

#elif (MAJOR_NR == FOO_MAJOR)
#define DEVICE_NAME "foobar"
#define DEVICE_REQUEST do_foo_request
#define DEVICE_INTR do_foo
#define DEVICE_TIMEOUT FOO_TIMER
#define TIMEOUT_VALUE 500
/* 500 == 5 seconds */
#define DEVICE_NR(device) (MINOR(device) > 6)
#define DEVICE_ON(device)
#define DEVICE_OFF(device)
#endif

This is where using SET_INTR() and CLEAR_INTR becomes helpful. Simply by defining DEVICE_TIMEOUT, SET_INTR is changed to automatically set a “watchdog timer” that goes off if the foo device has not responded after 5 seconds, SET_TIMER is provided to set the watchdog timer manually, and a CLEAR_TIMER macro is provided to turn off the watchdog timer. The only three other things that need to be done are to:

  1. Add a timer, FOO_TIMER, to linux/timer.h. This must be a #define'd value that is not already used and must be less than 32 (there are only 32 static timers).

  2. In the foo_init() function called at boot time to detect and initialize the hardware, a line must be added:

    timer_table[FOO_TIMER].fn = foo_times_out;
    
  3. And (as you may have guessed from step 2) a function foo_times_out() must be written to try restarting requests, or otherwise handling the time out condition.

The foo_times_out() function should probably reset the device, try to restart the request if appropriate, and should use the CURRENT->errors variable to keep track of how many errors have occurred on that request. It should also check to see if too many errors have occurred, and if so, call end_request(0) and go on to the next request.

Exactly what steps are required depend on how the hardware device behaves, but both the hd and the floppy drivers provide this functionality, and by comparing and contrasting them, you should be able to determine how to write such a function for your device. Here is a sample, loosely based on the hd_times_out() function in hd.c:

static void hd_times_out(void)
{
   unsigned int dev;
   SET_INTR(NULL);
   if (!CURRENT)
      /* completely spurious interrupt-
         pretend it didn't happen. */
      return;
   dev = DEVICE_NR(CURRENT->dev);
#ifdef DEBUG
   printk("foo%c: timeout\n", dev+'a');
#endif
   if (++CURRENT->errors >= FOO_MAX_ERRORS) {
#ifdef DEBUG
      printk("foo%c: too many errors\n", dev+'a');
#endif
      /* Tell buffer cache: couldn't fulfill request */
      end_request(0);
      INIT_REQUEST;
   }
   /* Now try the request again */
   foo_initialize_io();
}

SET_INTR(NULL) keeps this function from being called recursively. The next two lines ignore interrupts that occur when no requests have been issued. Then we check for excessive errors, and if there have been too many errors on this request, we abort it and go on to the next request, if any; if there are no requests, we return. (Remember that the INIT_REQUEST macro causes a return if there are no requests left.)

At the end, we are either retrying the current request or have given up and gone on to the next request, and in either case, we need to re-start the request.

We could reset the foo device right before calling foo_initialize_io(), if the device maintains some state and needs a reset. Again, this depends on the details of the device for which you are writing the driver.

Stay Tuned...

Next month, we will discuss optimizing block device drivers.

Other Resources

Michael K. Johnson is the editor of Linux Journal, and is also the author of the Linux Kernel Hackers' Guide (the KHG). He is using this column to develop and expand on the KHG.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix