One Box. Sixteen Trillion Bytes.

 in
Build your own 16-Terabyte file server with hardware RAID.
Performance

I know I could have used SAS drives with an SAS controller for better performance, but SAS disks are not yet available in the capacities offered by SATA, and they would have been much more expensive for less disk space.

For this project, I settled on a 16-drive system with a 16-port RAID controller. I did find a Supermicro 24-drive chassis (SC486) and a 3ware 24-port RAID controller (9650SE-24M8) that should work together. It would be interesting to see whether there is any performance downside to such a large system, but this would be overkill for my needs at the moment.

There are still plenty of options and choices with the existing configuration that may yield better performance than the default settings. I did not pursue all of these, as I needed to get this particular machine into production quickly. I would be interested in exploring performance improvements in the future, especially if the system was going to be used interactively by humans (and not just for automated backups late at night).

Possible areas for performance tuning include the following:

1) RAID schemes: I could have used a different scheme for better performance, but I felt RAID 5 was sufficient for my needs. I think RAID 6 also would have worked, and I would have ended up with the same amount of disk space (assuming two parity drives and no hot spare), but my understanding is that it would be slower than RAID 5.

2) ext3/XFS filesystem creation and mount options: I had a hard time finding any authoritative or definitive information on how to make XFS as fast as possible for a given situation. In my case, this was a relatively small number of large (multi-gigabyte) files. The mount and mkfs options that I used came from examples I found on various discussion groups, but I did not try to verify their performance claims. For example, some articles said that the mount options of noatime, nodiratime and osyncisdsync would improve performance. 3ware has a whitepaper covering optimizing XFS and 2.6 kernels with an older RAID controller model, but I have not tried those suggestions on the controller I used.

3) Drive jumpers: one surprise (for me at least) was finding that the Seagate drives come from the factory with the 1.5Gbps rate-limit jumper installed. As far as I can tell, the drive documentation does not say that this is the factory default setting, only that it “can be used”. Removing this jumper enables the drive to run at 3.0Gbps with controllers that support this speed (such as the 3ware 9560 used for this project). I was able to confirm the speed setting by using the 3ware 3dm Web interface (Information→Drive), but when I tried using tw_cli to view the same information, it did not display the speed currently in use:

# tw_cli /c0/p0 show lspeed
/c0/p0 Link Speed Supported = 1.5 Gbps and 3.0 Gbps
/c0/p0 Link Speed = unknown

The rate-limiting jumper is tiny and recessed into the back of the drive. I ended up either destroying or losing most of the jumpers in the process of prying them off the pins (before buying an extremely long and fine-tipped pair of needle-nose pliers for this task).

4) RAID card settings: Native Command Queuing (NCQ) is supposed to offer better performance by letting the drive electronics reorder commands for optimized disk access. I have found that NCQ is not always enabled by default on the 3ware controllers. It can be turned on manually using the queuing check box in the Controller Settings page of 3dm or via tw_cli:

# tw_cli /c0/u0 set qpolicy = on

The current setting can be verified on a per-drive basis via 3dm or by using tw_cli:

# tw_cli /c0/p5 show ncq
/c0/p5 NCQ Supported = Yes
/c0/p5 NCQ Enabled = Yes

5) Linux kernel settings: 3ware's knowledge base has articles that mention several kernel settings that are supposed to improve performance over the defaults, but I have not tried any of those myself.

6) Operational issues: despite all 16 disks being the same type and firmware version, some of them failed to display their model number properly in the various 3ware interfaces. For example, most of the disk model numbers are displayed correctly—for example, ST31000340AS. But, several show “ST3 INVALID PFM” in the model field. You can see this in the tw_cli interface. For instance, port 4 displays the model number properly, but port 5 does not:

# tw_cli /c0/p4 show model
/c0/p4 Model = ST31000340AS

# tw_cli /c0/p5 show model
/c0/p5 Model = ST3_INVALID_PFM

This situation would be intolerable in a system with a mix of drive types, as it would be difficult to determine which drive type was plugged in to which port. I was able to determine that the problem was with the drive firmware version and upgraded all the drives that exhibited this behavior.

As the system already was in active use before I determined that the firmware was the issue, I needed a way to upgrade each drive while keeping the system running. I could not simply upgrade the drives while they were part of an active disk array, as Seagate claims the upgrade could destroy data. I used the 3ware interface to remove the problem drive, which then forced the hot spare to replace it. The RAID controller automatically started to rebuild the RAID 5 array using the hot spare. I then physically removed the drive from the chassis and upgraded the drive firmware using another computer. After the upgrade, I re-inserted the drive and designated it as the new hot spare. The array rebuild operation took something like six hours to complete, and as I could remove and upgrade only one drive at a time, I was limited to one drive upgrade a day.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A Problem with device driver programming

mety's picture

hello
i am writing a driver for a board which uses AMCC s5935 and a EEPROM.
i use lspci -x to see the card`s information but i see the wrong values. i have tested it on MS DOS and Windows but i see this wrong values again.
once i prepared a Windriver for this card and i saw the correct values . i wanna write a program on linux and i need help .what should i do?
i see a wrong number on Base Address Register0.
but it must be something else.
i must add BaseAddressRegister0 with 0x3c and read its address but i read something wrong and i am confused.
i would be very great full if u could help me.
thanx

speed test on similar system

hjmangalam's picture

Readers of this might be interested in some benchmarks on a similar system broken down by filesystem, types of apps, #s of disks, types of RAID, etc.

The Storage Brick - Fast, Cheap, Reliable Terabytes

http://moo.nac.uci.edu/~hjm/sb/index.html

Cheers
Harry

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix