Kernel Korner - I/O Schedulers

Here's how I/O schedulers contribute to disk performance in Linux and the improvements you can get from the new I/O schedulers in the 2.6 kernel.

Although most Linux users are familiar with the role of process schedulers, such as the new O(1) scheduler, many users are not so familiar with the role of I/O schedulers. I/O schedulers are similar in some aspects to process schedulers; for instance, both schedule some resource among multiple users. A process scheduler virtualizes the resource of processor time among multiple executing processes on the system. So, what does an I/O scheduler schedule?

A naïve system would not even include an I/O scheduler. Unlike the process scheduler, the I/O scheduler is not a mandatory component of the operating system. Instead, performance is the I/O scheduler's sole raison d'être.

To understand the role of an I/O scheduler, let's go over some background information and then look at how a system behaves without an I/O scheduler. Hard disks address their data using the familiar geometry-based addressing of cylinders, heads and sectors. A hard drive is composed of multiple platters, each consisting of a single disk, spindle and read/write head. Each platter is divided further into circular ring-like tracks, similar to a CD or record. Finally, each track is composed of some integer number of sectors.

To locate a specific unit of data in a hard drive, the drive's logic requires three pieces of information: the cylinder, the head and the sector. The cylinder specifies the track on which the data resides. If you lay the platters on top of one another (as they are in a hard disk), a given track forms a cylinder through each platter. The head then identifies the exact read/write head (and thus the exact platter) in question. The search now is narrowed down to a single track on a single platter. Finally, the sector value denotes the exact sector on the track. The search is complete: the hard disk knows what sector, on what track, on what platter the data resides. It can position the read/write head of the correct platter over the correct track and read the proper sector.

Thankfully, modern hard disks do not force computers to communicate with them in terms of cylinders, heads and sectors. Instead, modern hard drives map a unique block number over each cylinder/head/sector triplet. The unique number identifies a specific cylinder/head/sector value. Modern operating systems then can address hard drives using this block number—called logical block addressing—and the hard drive translates the block number into the correct cylinder/head/sector value.

One thing of note about this block number: although nothing guarantees it, the physical mapping tends to be sequential. That is, logical block n tends to be physically adjacent to logical block n+1. We discuss why that is important later on.

Now, let's consider the typical UNIX system. Applications as varied as databases, e-mail clients, Web servers and text editors issue I/O requests to the disk, such as read this block and write to that block. The blocks tend to be located physically all over the disk. The e-mail spool may be located in an entirely different region of the disk from the Web server's HTML data or the text editor's configuration file. Indeed, even a single file can be strewn all over the disk if the file is fragmented, that is, not laid out in sequential blocks. Because the files are broken down into individual blocks, and hard drives are addressed by block and not the much more abstract concepts of files, reading or writing file data is broken down into a stream of many individual I/O requests, each to a different block. With luck, the blocks are sequential or at least physically close together. If the blocks are not near one another, the disk head must move to another location on the disk. Moving the disk head is called seeking, and it is one of the most expensive operations in a computer. The seek time on modern hard drives is measured in the tens of milliseconds. This is one reason why defragmented files are a good thing.

Unfortunately, it does not matter if the files are defragmented because the system is generating I/O requests for multiple files, all over the disk. The e-mail client wants a little from here and the Web server wants a little from there—but wait, now the text editor wants to read a file. The net effect is that the disk head is made to jump around the disk. In a worst-case scenario, with interleaved I/O requests to multiple files, the head can spend all of its time jumping around from one location to another—not a good thing for overall system performance.

This is where the I/O scheduler comes in. The I/O scheduler schedules the pending I/O requests in order to minimize the time spent moving the disk head. This, in turn, minimizes disk seek time and maximizes hard disk throughput.

This magic is accomplished through two main actions, sorting and merging. First, the I/O scheduler keeps the list of pending I/O requests sorted by block number. When a new I/O request is issued, it is inserted, block-wise, into the list of pending requests. This prevents the drive head from seeking all around the disk to service I/O requests. Instead, by keeping the list sorted, the disk head moves in a straight line around the disk. If the hard drive is busy servicing a request at one part of the disk, and a new request comes in to the same part of the disk, that request can be serviced before moving off to other parts of the disk.

Merging occurs when an I/O request is issued to an identical or adjacent region of the disk. Instead of issuing the new request on its own, it is merged into the identical or adjacent request. This minimizes the number of outstanding requests.

Let's look at an example. Consider the case where two applications issue requests to the following block numbers, such that they arrived in the kernel in this order: 10, 500, 12, 502, 14, 504 and 12. The I/O scheduler-less approach would service these blocks in the given order. That is seven long seeks, back and forth between two parts of the disk. What a waste! If the kernel sorted and merged these requests, however, and serviced them in that order, the results would be much different: 10, 12, 14, 500, 502 and 504. Only a single far-off seek and one less request overall.

In this manner, an I/O scheduler virtualizes the resources of disk I/O among multiple I/O requests in order to maximize global throughput. Because I/O throughput is so crucial to system performance and because seek time is so horribly slow, the job of an I/O scheduler is important.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Very informative read.

Binary Soldier's picture

Thanks for taking the time to publish this article.

I'm currently studying operating system fundamentals at university and it was great to read on it's usage in the Linux operating system. I had learned how Linux treats all hardware items as 'files', but reading this has given me a much more in-depth understanding.

*Bookmarked*

great article

procfs's picture

Hi this is grate, is there any other documents like this plain and symple

Thanks

Best regards

Nice article

Michael R. Hines's picture

I like this. It describes the problem to the point and nothing else. Thanks for writing it.

Re: Kernel Korner: I/O Schedulers

Anonymous's picture

Elementary my dear Watson.

That's only a mere 122 fold increase in test two.

Now for your next trick.......

Mick.

What about driver improvements?

Tom Callahan's picture

Your discounting the fact that there are many other performance enhancing improvements to the 2.6 kernel that have highly boosted read/write performance. Especially in the form of better supported IDE/SCSI devices and better drivers and feature sets.

I agree that anticipatory scheduler is MUCH better than the older schedulers, but recognize hardware/software improvements as well.

Anticipatory is not the best for everyone also, I recommend any person or company to run many many tests that will emulate your environment to test which scheduler is best for you...

-Tom

Nice article. However, I

Anonymous's picture

Nice article. However, I agree that schedulers needs to be tested in a particular environment to be sure which choice is best.

For example, AS is a poor choice for RAID and/or virtual machines. Virtual hosts with RAID disks appear to work best with the deadline elevator. VM guests should use the noop elevator so not to disguise physical access with virtual characteristics, let alone the ineffective and unnecessary overhead of using AS in a VM guest.

-shubes

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix