One Box. Sixteen Trillion Bytes.
I know I could have used SAS drives with an SAS controller for better performance, but SAS disks are not yet available in the capacities offered by SATA, and they would have been much more expensive for less disk space.
For this project, I settled on a 16-drive system with a 16-port RAID controller. I did find a Supermicro 24-drive chassis (SC486) and a 3ware 24-port RAID controller (9650SE-24M8) that should work together. It would be interesting to see whether there is any performance downside to such a large system, but this would be overkill for my needs at the moment.
There are still plenty of options and choices with the existing configuration that may yield better performance than the default settings. I did not pursue all of these, as I needed to get this particular machine into production quickly. I would be interested in exploring performance improvements in the future, especially if the system was going to be used interactively by humans (and not just for automated backups late at night).
Possible areas for performance tuning include the following:
1) RAID schemes: I could have used a different scheme for better performance, but I felt RAID 5 was sufficient for my needs. I think RAID 6 also would have worked, and I would have ended up with the same amount of disk space (assuming two parity drives and no hot spare), but my understanding is that it would be slower than RAID 5.
2) ext3/XFS filesystem creation and mount options: I had a hard time finding any authoritative or definitive information on how to make XFS as fast as possible for a given situation. In my case, this was a relatively small number of large (multi-gigabyte) files. The mount and mkfs options that I used came from examples I found on various discussion groups, but I did not try to verify their performance claims. For example, some articles said that the mount options of noatime, nodiratime and osyncisdsync would improve performance. 3ware has a whitepaper covering optimizing XFS and 2.6 kernels with an older RAID controller model, but I have not tried those suggestions on the controller I used.
3) Drive jumpers: one surprise (for me at least) was finding that the Seagate drives come from the factory with the 1.5Gbps rate-limit jumper installed. As far as I can tell, the drive documentation does not say that this is the factory default setting, only that it “can be used”. Removing this jumper enables the drive to run at 3.0Gbps with controllers that support this speed (such as the 3ware 9560 used for this project). I was able to confirm the speed setting by using the 3ware 3dm Web interface (Information→Drive), but when I tried using tw_cli to view the same information, it did not display the speed currently in use:
# tw_cli /c0/p0 show lspeed /c0/p0 Link Speed Supported = 1.5 Gbps and 3.0 Gbps /c0/p0 Link Speed = unknown
The rate-limiting jumper is tiny and recessed into the back of the drive. I ended up either destroying or losing most of the jumpers in the process of prying them off the pins (before buying an extremely long and fine-tipped pair of needle-nose pliers for this task).
4) RAID card settings: Native Command Queuing (NCQ) is supposed to offer better performance by letting the drive electronics reorder commands for optimized disk access. I have found that NCQ is not always enabled by default on the 3ware controllers. It can be turned on manually using the queuing check box in the Controller Settings page of 3dm or via tw_cli:
# tw_cli /c0/u0 set qpolicy = on
The current setting can be verified on a per-drive basis via 3dm or by using tw_cli:
# tw_cli /c0/p5 show ncq /c0/p5 NCQ Supported = Yes /c0/p5 NCQ Enabled = Yes
5) Linux kernel settings: 3ware's knowledge base has articles that mention several kernel settings that are supposed to improve performance over the defaults, but I have not tried any of those myself.
6) Operational issues: despite all 16 disks being the same type and firmware version, some of them failed to display their model number properly in the various 3ware interfaces. For example, most of the disk model numbers are displayed correctly—for example, ST31000340AS. But, several show “ST3 INVALID PFM” in the model field. You can see this in the tw_cli interface. For instance, port 4 displays the model number properly, but port 5 does not:
# tw_cli /c0/p4 show model /c0/p4 Model = ST31000340AS # tw_cli /c0/p5 show model /c0/p5 Model = ST3_INVALID_PFM
This situation would be intolerable in a system with a mix of drive types, as it would be difficult to determine which drive type was plugged in to which port. I was able to determine that the problem was with the drive firmware version and upgraded all the drives that exhibited this behavior.
As the system already was in active use before I determined that the firmware was the issue, I needed a way to upgrade each drive while keeping the system running. I could not simply upgrade the drives while they were part of an active disk array, as Seagate claims the upgrade could destroy data. I used the 3ware interface to remove the problem drive, which then forced the hot spare to replace it. The RAID controller automatically started to rebuild the RAID 5 array using the hot spare. I then physically removed the drive from the chassis and upgraded the drive firmware using another computer. After the upgrade, I re-inserted the drive and designated it as the new hot spare. The array rebuild operation took something like six hours to complete, and as I could remove and upgrade only one drive at a time, I was limited to one drive upgrade a day.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.