One Box. Sixteen Trillion Bytes.
I planned to create a conventional set of partitions for the operating system and then a single giant partition for the storage of system backup files (named backup).
Some RAID vendors allow you to create an arbitrary number of virtual disks from a given physical array of disks. The 3ware interface allows only two virtual disks (or volumes) per physical array. When you are creating a physical array, you typically will end up with a single virtual disk using the entire capacity of the array (for example, creating a virtual disk from a RAID 1 array of two 1TB disks gives you a 1TB /dev/sda disk). If you want two virtual disks, specify a nonzero size for the boot partition. The first virtual disk will be created in the size you specify for the boot partition, and the second will be the physical array capacity minus the size of the boot partition (for example, using 1TB disks, specifying a 150GB boot partition yields a 150GB /dev/sda disk and an 850GB /dev/sdb disk).
You can perform the entire RAID configuration from the 3ware controller 3dbm BIOS interface before the OS boots (press Alt-3), or use the tw_cli command line or 3dm Web interface (or a combination of these) after the OS is running. For example, you could use the BIOS interface to set up just the minimal array you need for the OS installation and then use the 3dm Web interface to set up additional arrays and hot sparing after the OS is running.
For my 16-drive system, I decided to use 15 drives in a RAID 5 array. The remaining 16th drive is a hot spare. With this scheme, three disks would have to fail before any data was lost (the system could tolerate the loss of one array member disk and then the loss of the hot spare that would replace it). I used the 3ware BIOS interface to create a 100GB boot partition, which gave me a virtual sda disk and a virtual sdb disk of about 12.64TB.
I found the system to be very slow during the RAID array initialization. I did not record the time, but the initialization of the RAID 5 array seemed to take at least a day, maybe longer. I suggest starting the process and working on something else until it finishes, or you will be frustrated with the poor interactive performance.
I knew I had to use something other than ext3 for the giant data partition, and XFS looked like the best solution according the information I could find on the Web. Most of my Linux experience involves Red Hat's Linux Enterprise distribution, but I had trouble finding information on adding XFS support. I specifically wanted to avoid anything difficult or complicated to reproduce. CentOS seemed like the best OS choice, as it leveraged my Red Hat experience and had a trivial process for adding XFS support.
For the project system, I installed the OS using Kickstart. I created a kickstart file that automatically created a 6GB /, 150MB /boot and a 64GB swap partition on the /dev/sda virtual disk using a conventional msdos disk label and ext3 filesystems. (I typically would allocate less swap than this, but I've found through experience that the xfs_check utility required something like 26GB of memory to function—anything less and it would die with “out of memory” errors). The Kickstart installation ignored the /dev/sdb disk for the time being. I could have automated all the disk partitioning and XFS configuration via Kickstart, but I specifically wanted to play around with the creation of the large partition manually. Once the Kickstart OS install was finished, I manually added XFS support with the following yum commands:
yum install kmod-xfs xfs-utils
At this time, I downloaded and installed the 3ware tw_cli command line and 3dm Web interface package from the 3ware Web site. I used the 3dm Web interface to create the hot spare.
Next, I used parted to create a gpt-labeled disk with a single XFS filesystem on the 14TB virtual disk /dev/sdb. Another argument for using something other than ext3 is filesystem creation time. For example, when I first experimented with a 3TB test partition under both ext3 and XFS, an mkfs took 3.5 hours under ext3 and less than 15 seconds for XFS. The XFS mkfs operation was extremely fast, even with the RAID array initialization in progress.
I used the following commands to set up the large partition named /backup for storing the disk-to-disk backups:
# parted /dev/sdb (parted) mklabel gpt (parted) mkpartfs primary xfs 0% 100% (parted) print Model: AMCC 9650SE-16M DISK (scsi) Disk /dev/sdb: 13.9TB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 17.4kB 13.9TB 13.9TB xfs primary (parted) quit # mkfs.xfs /dev/sdb1 # mount /dev/sdb1 /backup
Next, I made the mount permanent by adding it to /etc/fstab.
I now considered the system to be pretty much functional, and the rest of the configuration effort was specifically related to the system's role as a disk-to-disk backup server.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Weechat, Irssi's Little Brother
- Tech Tip: Really Simple HTTP Server with Python
- Poul-Henning Kamp: welcome to
11 min 19 sec ago
- This has already been done
12 min 19 sec ago
- Reply to comment | Linux Journal
57 min 33 sec ago
- Welcome to 1998
1 hour 46 min ago
- notifier shortcomings
2 hours 9 min ago
3 hours 46 min ago
- Android User
3 hours 48 min ago
- Reply to comment | Linux Journal
5 hours 41 min ago
8 hours 30 min ago
- This is a good post. This
13 hours 43 min ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?