Kernel Korner - Storage Improvements for 2.6 and 2.7
and to scan only one particular target, enter:
echo "0 7 -" > /sys/class/scsi_host/host3/scan
Although this design is not particularly user-friendly, it works fine for automated tools, which can make use of the libsys library and the systool utility.
One of FC's strengths is it permits redundant paths between servers and RAID arrays, which can allow failing FC devices to be removed and replaced without server applications even noticing that anything happened. However, this is possible only if the server has a robust implementation of multipath I/O.
One certainly cannot complain about a shortage of multipath I/O implementations for the Linux 2.4 kernel. The reality is quite the opposite, as there are implementations in the SCSI mid-layer, in device drivers, in the md driver and in the LVM layer.
In fact, too many I/O implementations in 2.6 can make it difficult or even impossible to attach different types of RAID arrays to the same server. The Linux kernel needs a single multipath I/O implementation that accommodates all multipath-capable devices. Ideally, such an implementation continuously would monitor all possible paths and determine automatically when a failed piece of FC equipment had been repaired. Hopefully, the ongoing work on device-mapper (dm) multipath target will solve these problems.
Some RAID arrays allow extremely large LUNs to be created from the concatenation of many disks. The Linux 2.6 kernel includes a CONFIG_LBD parameter that accommodates multiterabyte LUNs.
In order to run large databases and associated applications on Linux, large numbers of LUNs are required. Theoretically, one could use a smaller number of large LUNs, but there are a number of problems with this approach:
Many storage devices place limits on LUN size.
Disk-failure recovery takes longer on larger LUNs, making it more likely that a second disk will fail before recovery completes. This secondary failure would mean unrecoverable data loss.
Storage administration is much easier if most of the LUNs are of a fixed size and thus interchangeable. Overly large LUNs waste storage.
Large LUNs can require longer backup windows, and the added downtime may be more than users of mission-critical applications are willing to put up with.
The size of the kernel's dev_t increased from 16 bits to 32 bits, which permits i386 builds of the 2.6 kernel to support 4,096 LUNs, though at the time of this writing, one additional patch still is waiting to move from James Bottomley's tree into the main tree. Once this patch is integrated, 64-bit CPUs will be able to support up to 32,768 LUNs, as should i386 kernels built with a 4G/4G split and/or Maneesh Soni's sysfs/dcache patches. Of course, 64-bit x86 processors, such as AMD64 and the 64-bit ia32e from Intel, should help put 32-bit limitations out of their misery.
Easy access to large RAID arrays from multiple servers over high-speed storage area networks (SANs) makes distributed filesystems much more interesting and useful. Perhaps not coincidentally, a number of open-source distributed filesystem are under development, including Lustre, OpenGFS and the client portion of IBM's SAN Filesystem. In addition, a number of proprietary distributed filesystems are available, including SGI's CXFS and IBM's GPFS. All of these distributed filesystems offer local filesystem semantics.
In contrast, older distributed filesystems, such as NFS, AFS and DFS, offer restricted semantics in order to conserve network bandwidth. For example, if a pair of AFS clients both write to the same file at the same time, the last client to close the file wins—the other client's changes are lost. This difference is illustrated in the following sequence of events:
Client A opens a file.
Client B opens the same file.
Client A writes all As to the first block of the file.
Client B writes all Bs to the first block of the file.
Client B writes all Bs to the second block of the file.
Client A writes all As to the second block of the file.
Client A closes the file.
Client B closes the file.
With local-filesystem semantics, the first block of the file contain all Bs and the second block all As. With last-close semantics, both blocks contain all Bs.
This difference in semantics might surprise applications designed to run on local filesystems, but it greatly reduces the amount of communication required between the two clients. With AFS last-close semantics, the two clients need to communicate only when opening and closing. With strict local semantics, however, they may need to communicate on each write.
It turns out that a surprisingly large fraction of existing applications tolerate the difference in semantics. As local networks become faster and cheaper, however, there is less reason to stray from local filesystem semantics. After all, a distributed filesystem offering the exact same semantics as a local filesystem can run any application that runs on the local filesystem. Distributed filesystems that stray from local filesystem semantics, on the other hand, may or may not do so. So, unless you are distributing your filesystem across a wide-area network, the extra bandwidth seems a small price to pay for full compatibility.
The Linux 2.4 kernel was not entirely friendly to distributed filesystems. Among other things, it lacked an API for invalidating pages from an mmap()ed file and an efficient way of protecting processes from oom_kill(). It also lacked correct handling for NFS lock requests made to two different servers exporting the same distributed filesystem.
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
|Trying to Tame the Tablet||May 08, 2013|
|Dart: a New Web Programming Experience||May 07, 2013|
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- Trying to Tame the Tablet
- Tech Tip: Really Simple HTTP Server with Python
- Agreed on AirDroid. With my
3 min 31 sec ago
- I just learned this
7 min 41 sec ago
37 min 45 sec ago
- not living upto the mobile revolution
3 hours 29 min ago
- Deceptive Advertising and
4 hours 4 min ago
- Let\'s declare that you have
4 hours 5 min ago
- Alterations in Contest Due
4 hours 6 min ago
- At a numbers mindset, your
4 hours 7 min ago
- Do not get Just Almost any
4 hours 11 min ago
- A fantastic rule-of-thumb to
4 hours 12 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.