Kernel Korner - I/O Schedulers
Although most Linux users are familiar with the role of process schedulers, such as the new O(1) scheduler, many users are not so familiar with the role of I/O schedulers. I/O schedulers are similar in some aspects to process schedulers; for instance, both schedule some resource among multiple users. A process scheduler virtualizes the resource of processor time among multiple executing processes on the system. So, what does an I/O scheduler schedule?
A naïve system would not even include an I/O scheduler. Unlike the process scheduler, the I/O scheduler is not a mandatory component of the operating system. Instead, performance is the I/O scheduler's sole raison d'être.
To understand the role of an I/O scheduler, let's go over some background information and then look at how a system behaves without an I/O scheduler. Hard disks address their data using the familiar geometry-based addressing of cylinders, heads and sectors. A hard drive is composed of multiple platters, each consisting of a single disk, spindle and read/write head. Each platter is divided further into circular ring-like tracks, similar to a CD or record. Finally, each track is composed of some integer number of sectors.
To locate a specific unit of data in a hard drive, the drive's logic requires three pieces of information: the cylinder, the head and the sector. The cylinder specifies the track on which the data resides. If you lay the platters on top of one another (as they are in a hard disk), a given track forms a cylinder through each platter. The head then identifies the exact read/write head (and thus the exact platter) in question. The search now is narrowed down to a single track on a single platter. Finally, the sector value denotes the exact sector on the track. The search is complete: the hard disk knows what sector, on what track, on what platter the data resides. It can position the read/write head of the correct platter over the correct track and read the proper sector.
Thankfully, modern hard disks do not force computers to communicate with them in terms of cylinders, heads and sectors. Instead, modern hard drives map a unique block number over each cylinder/head/sector triplet. The unique number identifies a specific cylinder/head/sector value. Modern operating systems then can address hard drives using this block number—called logical block addressing—and the hard drive translates the block number into the correct cylinder/head/sector value.
One thing of note about this block number: although nothing guarantees it, the physical mapping tends to be sequential. That is, logical block n tends to be physically adjacent to logical block n+1. We discuss why that is important later on.
Now, let's consider the typical UNIX system. Applications as varied as databases, e-mail clients, Web servers and text editors issue I/O requests to the disk, such as read this block and write to that block. The blocks tend to be located physically all over the disk. The e-mail spool may be located in an entirely different region of the disk from the Web server's HTML data or the text editor's configuration file. Indeed, even a single file can be strewn all over the disk if the file is fragmented, that is, not laid out in sequential blocks. Because the files are broken down into individual blocks, and hard drives are addressed by block and not the much more abstract concepts of files, reading or writing file data is broken down into a stream of many individual I/O requests, each to a different block. With luck, the blocks are sequential or at least physically close together. If the blocks are not near one another, the disk head must move to another location on the disk. Moving the disk head is called seeking, and it is one of the most expensive operations in a computer. The seek time on modern hard drives is measured in the tens of milliseconds. This is one reason why defragmented files are a good thing.
Unfortunately, it does not matter if the files are defragmented because the system is generating I/O requests for multiple files, all over the disk. The e-mail client wants a little from here and the Web server wants a little from there—but wait, now the text editor wants to read a file. The net effect is that the disk head is made to jump around the disk. In a worst-case scenario, with interleaved I/O requests to multiple files, the head can spend all of its time jumping around from one location to another—not a good thing for overall system performance.
This is where the I/O scheduler comes in. The I/O scheduler schedules the pending I/O requests in order to minimize the time spent moving the disk head. This, in turn, minimizes disk seek time and maximizes hard disk throughput.
This magic is accomplished through two main actions, sorting and merging. First, the I/O scheduler keeps the list of pending I/O requests sorted by block number. When a new I/O request is issued, it is inserted, block-wise, into the list of pending requests. This prevents the drive head from seeking all around the disk to service I/O requests. Instead, by keeping the list sorted, the disk head moves in a straight line around the disk. If the hard drive is busy servicing a request at one part of the disk, and a new request comes in to the same part of the disk, that request can be serviced before moving off to other parts of the disk.
Merging occurs when an I/O request is issued to an identical or adjacent region of the disk. Instead of issuing the new request on its own, it is merged into the identical or adjacent request. This minimizes the number of outstanding requests.
Let's look at an example. Consider the case where two applications issue requests to the following block numbers, such that they arrived in the kernel in this order: 10, 500, 12, 502, 14, 504 and 12. The I/O scheduler-less approach would service these blocks in the given order. That is seven long seeks, back and forth between two parts of the disk. What a waste! If the kernel sorted and merged these requests, however, and serviced them in that order, the results would be much different: 10, 12, 14, 500, 502 and 504. Only a single far-off seek and one less request overall.
In this manner, an I/O scheduler virtualizes the resources of disk I/O among multiple I/O requests in order to maximize global throughput. Because I/O throughput is so crucial to system performance and because seek time is so horribly slow, the job of an I/O scheduler is important.
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- A Topic for Discussion - Open Source Feature-Richness?
- New Products
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- The Secret Password Is...
3 hours 34 min ago
- Keeping track of IP address
5 hours 25 min ago
- Roll your own dynamic dns
10 hours 38 min ago
- Please correct the URL for Salt Stack's web site
13 hours 50 min ago
- Android is Linux -- why no better inter-operation
16 hours 5 min ago
- Connecting Android device to desktop Linux via USB
16 hours 34 min ago
- Find new cell phone and tablet pc
17 hours 32 min ago
19 hours 1 min ago
- Automatically updating Guest Additions
20 hours 9 min ago
- I like your topic on android
20 hours 56 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?