Block Device Drivers: Optimization
Last month, I ran out of space for this column just as I was about to start a long discussion on optimizations. In the interest of usefulness, I will mention only the most useful optimizations here before going on to discuss initialization.
I won't have any sample code implementations of the optimizations, because these are complex issues that need to be handled in device-specific ways, and just as the code was much more vague when I discussed interrupt-driven drivers than when I introduced the basic, initial device driver, it would either be so large as to take over the entire magazine or be so vague as to be completely useless if I were to try to write some here. I hope that my explanations are useful to you even without code.
Also, I should warn you that the optimizations I talk about, while representative of common optimizations, are not necessarily representative of anything you will find in the Linux source code, except where I explicitly state otherwise. I'm writing on a slightly more theoretical level—about what can be done, rather than what has been done. Things conceptually similar to what I write about have been done, but the details are my own and may not be the best way to go about things. As usual, use this column as an introduction to the kernel source code and read the actual source code for far more insight than I can give you here.
If you are code-starved and don't care about optimizations, jump right to the second part of this article, where I talk about initialization.
One common optimization is coalescing adjacent requests. This means that when the driver is notified or notices that a request has been added to the request queue, it looks through the request list to see if there is a request for the next block (and possibly more blocks beyond that). If so, it sends a request to the hardware to read more than one block with one command, and when the data comes in from the hardware (presumably in an interrupt service routine), it fills both requests before calling end_request(1) (actually, some similar function designed especially for that driver) on either request. After satisfying (or failing to satisfy) the requests, the equivalent of end_request() is called for each request, but without waking up processes waiting on either request until the interrupt has been satisfied.
This will require that you write your own version of end_request(). Although this probably sounds daunting, it isn't as hard as it sounds, because you can use almost all of it as-is. For example, you could copy it verbatim, except instead of doing wake_up(&wait_for_request) at the end, you could add wait_for_request to a list of events to wake up when you are ready. Then you would call this new almost_end_request() function as soon as you have finished processing each request. When you are done handling the entire interrupt and are ready to wake up processes, iterate over the list of events, calling wake_up() on each in turn, from first satisfied to last satisfied.
Note that wake_up() will not cause a context switch directly. The driver will not give up control while running wake_up() to a process being woken up. Instead, wake_up() makes all the processes being woken up “runnable”, and sets the need_resched flag. This flag says that the scheduler ought to be called at the next convenient time, such as when returning from a “slow” interrupt handling routine (including the clock handling routine) or when returning from a system call. This means that the driver will not be pre-empted by calling wake_up(), and so it will be able to wake up all the necessary processes without being pre-empted.
This will likely take several tries to get right. All I can say to help is “Make sure you have backups. Really.”
The only driver in the Linux kernel that I have noticed doing anything like this is the floppy driver; the track buffer works in a similar way, where more than one request may be satisfied by a single read command sent to the hardware. If you are interested in investigating how it works, read drivers/block/floppy.c and search for floppy_track_buffer and read the entire function make_raw_rw_request().
Sounds like a “boondoggle”, doesn't it? Scatter-gather is perhaps a little bit similar in concept to coalescing adjacent requests, but is used with more intelligent hardware, and is perhaps a bit easier to implement. The “scatter” part means that when there are multiple blocks to be written all over a disk (for example), one command is sent out to initiate writing to all those different sectors, reducing the overhead involved in negotiation from O(n) to O(1), where n is the number of blocks or sectors to write.
Similarly, the “gather” part means that when there are multiple blocks to be read, one command is sent out to initiate reading all the blocks, and as the disk sends in each block, the corresponding request is marked as satisfied with end_request(1) or equivalent device-specific code. You will only be able to easily use end_request() unmodified with scatter-gather if each block read or written results in a separate interrupt being generated, and perhaps not even then. The SCSI driver does its own, which is probably the best way to go.
If you want to increase throughput, at the slight expense of response time, you could use timers to help: when your request() is notified that there is a request, and sees that there is only one request outstanding, it could set a timer to go off soon (one or two tenths of a second, perhaps), assuming that while waiting, more requests will spill in to be dealt with, and that when a certain number of requests have been made, or the timer has gone off, whichever comes first, scatter-gather will be applied to the blocks. If the request() routine is called and notices that “enough” (however many that is...) requests have accumulated, it would un-install the timer and process the requests. If the timer were to go off, all requests would be processed.
Note that the timer used should not be the same static timer used for the hardware timeout. Instead, it should be a dynamically allocated timer. See <linux/timer.h> for details on the dynamic timer routines.
I will repeat my standard disclaimer: this is simplified (at least, I'm trying to simplify it...) and if you want more detailed and correct information, study a real driver. The SCSI high-level drivers (scsi.c, sd.c, sr.c) are definitely the place to start. (I don't mention st.c and sg.c because they are character, not block, devices.)
|Happy Birthday Linux||Aug 25, 2016|
|ContainerCon Vendors Offer Flexible Solutions for Managing All Your New Micro-VMs||Aug 24, 2016|
|Updates from LinuxCon and ContainerCon, Toronto, August 2016||Aug 23, 2016|
|NVMe over Fabrics Support Coming to the Linux 4.8 Kernel||Aug 22, 2016|
|What I Wish I’d Known When I Was an Embedded Linux Newbie||Aug 18, 2016|
|Pandas||Aug 17, 2016|
- Happy Birthday Linux
- ContainerCon Vendors Offer Flexible Solutions for Managing All Your New Micro-VMs
- Updates from LinuxCon and ContainerCon, Toronto, August 2016
- What I Wish I’d Known When I Was an Embedded Linux Newbie
- New Version of GParted
- NVMe over Fabrics Support Coming to the Linux 4.8 Kernel
- Tor 0.2.8.6 Is Released
- All about printf
- Blender for Visual Effects
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide