Kernel Korner - ATA Over Ethernet: Putting Hard Drives on the LAN
Everybody runs out of disk space at some time. Fortunately, hard drives keep getting larger and cheaper. Even so, the more disk space there is, the more we use, and soon we run out again.
Some kinds of data are huge by nature. Video, for example, always takes up a lot of space. Businesses often need to store video data, especially with digital surveillance becoming more common. Even at home, we enjoy watching and making movies on our computers.
Backup and data redundancy are essential to any business using computers. It seems no matter how much storage capacity there is, it always would be nice to have more. Even e-mail can overgrow any container we put it in, as Internet service providers know too well.
Unlimited storage becomes possible when the disks come out of the box, decoupling the storage from the computer that's using it. The principle of decoupling related components to achieve greater flexibility shows up in many domains, not only data storage. Modular source code can be used more flexibly to meet unforeseen needs, and a stereo system made from components can be used in more interesting configurations than an all-in-one stereo box can be.
The most familiar example of out-of-the-box storage probably is the storage area network (SAN). I remember when SANs started to create a buzz; it was difficult to work past the hype and find out what they really were. When I finally did, I was somewhat disappointed to find that SANs were complex, proprietary and expensive.
In supporting these SANs, though, the Linux community has made helpful changes to the kernel. The enterprise versions of 2.4 kernel releases informed the development of new features of the 2.6 kernel, and today's stable kernel has many abilities we lacked only a few years ago. It can use huge block devices, well over the old limit of two terabytes. It can support many more simultaneously connected disks. There's also support for sophisticated storage volume management. In addition, filesystems now can grow to huge sizes, even while mounted and in use.
This article describes a new way to leverage these new kernel features, taking disks out of the computer and overcoming previous limits on storage use and capacity. You can think of ATA over Ethernet (AoE) as a way to replace your IDE cable with an Ethernet network. With the storage decoupled from the computer and the flexibility of Ethernet between the two, the possibilities are limited only by your imagination and willingness to learn new things.
ATA over Ethernet is a network protocol registered with the IEEE as Ethernet protocol 0x88a2. AoE is low level, much simpler than TCP/IP or even IP. TCP/IP and IP are necessary for the reliable transmission of data over the Internet, but the computer has to work harder to handle the complexity they introduce.
Users of iSCSI have noticed this issue with TCP/IP. iSCSI is a way to send I/O over TCP/IP, so that inexpensive Ethernet equipment may be used instead of Fibre Channel equipment. Many iSCSI users have started buying TCP offload engines (TOE). These TOE cards are expensive, but they remove the burden of doing TCP/IP from the machines using iSCSI.
An interesting observation is that most of the time, iSCSI isn't actually used over the Internet. If the packets simply need to go to a machine in the rack next door, the heavyweight TCP/IP protocol seems like overkill.
So instead of offloading TCP/IP, why not dispense with it altogether? The ATA over Ethernet protocol does exactly that, taking advantage of today's smart Ethernet switches. A modern switch has flow control, maximizing throughput and limiting packet collisions. On the local area network (LAN), packet order is preserved, and each packet is checksummed for integrity by the networking hardware.
Each AoE packet carries a command for an ATA drive or the response from the ATA drive. The AoE Linux kernel driver performs AoE and makes the remote disks available as normal block devices, such as /dev/etherd/e0.0—just as the IDE driver makes the local drive at the end of your IDE cable available as /dev/hda. The driver retransmits packets when necessary, so the AoE devices look like any other disks to the rest of the kernel.
In addition to ATA commands, AoE has a simple facility for identifying available AoE devices using query config packets. That's all there is to it: ATA command packets and query config packets.
Anyone who has worked with or learned about SANs likely wonders at this point, “If all the disks are on the LAN, then how can I limit access to the disks?” That is, how can I make sure that if machine A is compromised, machine B's disks remain safe?
The answer is that AoE is not routable. You easily can determine what computers see what disks by setting up ad hoc Ethernet networks. Because AoE devices don't have IP addresses, it is trivial to create isolated Ethernet networks. Simply power up a switch and start plugging in things. In addition, many switches these days have a port-based VLAN feature that allows a switch effectively to be partitioned into separate, isolated broadcast domains.
The AoE protocol is so lightweight that even inexpensive hardware can use it. At this time, Coraid is the only vendor of AoE hardware, but other hardware and software developers should be pleased to find that the AoE specification is only eight pages in length. This simplicity is in stark contrast to iSCSI, which is specified in hundreds of pages, including the specification of encryption features, routability, user-based access and more. Complexity comes at a price, and now we can choose whether we need the complexity or would prefer to avoid its cost.
Simple primitives can be powerful tools. It may not come as a surprise to Linux users to learn that even with the simplicity of AoE, a bewildering array of possibilities present themselves once the storage can reside on the network. Let's start with a concrete example and then discuss some of the possibilities.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Non-Linux FOSS: libnotify, OS X Style
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Reply to comment | Linux Journal
9 min 7 sec ago
- Reply to comment | Linux Journal
4 hours 8 min ago
- Yeah, user namespaces are
5 hours 25 min ago
- Cari Uang
8 hours 56 min ago
- user namespaces
11 hours 49 min ago
12 hours 15 min ago
- One advantage with VMs
14 hours 44 min ago
- about info
15 hours 17 min ago
15 hours 18 min ago
15 hours 19 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?