Build a Home Terabyte Backup System Using Linux
During the past few years, I have built backup servers using Red Hat Linux 9, but you can use any flavor of Linux. I use Red Hat 9 because it is stable, free, currently maintained (Fedora Legacy Project) and simple to install and configure. If you buy a new computer, you may have to use a more current version of Linux. I generally do not use RAID for low-budget systems where cost is paramount, but it is worth considering.
Software requirements for a Linux backup server are minimal. Basic network administration utilities (including the secure shell, SSH, and secure shell daemon, sshd) and rsync are required. rsync is a fast, incremental duplication/synchronization utility that comes with most Linux distributions. With SSH and rsync, you can carry out virtually all basic backup tasks. It is advantageous for a backup server also to be a fileserver, so I install Samba, the SMB fileserver as well. I use Samba because it is the default fileserver for MS Windows clients, and it also is readily accessible by any UNIX system (including Mac OS X) using a Samba client. If you have a homogenous UNIX network, you can use NFS, which I will not discuss here.
If you need to attach additional disks to your server, begin by making sure you have enough data (IDE/SATA/SCSI) cables and power lines to accommodate the expansion. Ensure that your drive is Linux-compatible (although most are). Turn off the power to your computer and disconnect the power cable. Physically attach the disk(s) to your computer. Linux should recognize the new disk(s) on boot. If your drive is not recognized, your disk is incompatible or you need to locate and install a driver for it. Check boot messages for new drives using the dmesg command. The boot message for an IDE drive may look like this:
hdb: ST3400832A, ATA DISK drive
All IDE/ATA (and some SATA) drives have the designation hdx, where the x is replaced with a letter of the alphabet (b in this case). Similarly, adding new USB or SCSI (and some SATA) disks gives boot messages indicating a new drive designation sdx, where the x is replaced by the appropriate letter.
Most Linux distributions come with a GUI disk manager. These disk managers let you define and format partitions (I generally use one partition per backup disk), assign mountpoints (for example, /data1, /data2) and mount the partition. The process also can be done from the command line using fdisk to create partitions.
Creating New Partitions
To create new partitions on hdb (above), type:
fdisk /dev/hdb
Type m at the fdisk prompt for a help summary. Typing n at the prompt asks about the new partition we are creating:
Command action e extended p primary partition (1-4) p
For a single primary partition, type in p:
Partition number (1-4):1
You are then prompted for a partition number (type 1 for a single partition). Next, set the partition size by determining the first and last cylinder. Because we are using the whole disk, you should be able to select the default values (the first and last cylinders):
First cylinder (1-48641, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-48641, default 48641): Using default value 48641
Type w to write the partition table. You now have a partition, /dev/hdb1, that occupies the whole disk.
Next, format the partition in the filesystem of choice (mine is in the ext3 format) using the mkfs command:
mkfs -t ext3 /dev/hdb1
Create a mountpoint for the new partition of your new disk (I'll call it /data1):
mkdir /data1
Mount the newly created ext3 partition:
mount -t ext3 /dev/hdb1 /data1
And, test reading and writing. Finally, add a line in /etc/fstab, the mount table, to mount automatically during the boot process:
# Device mountpoint fstype options freq pass_no /dev/hdb1 /data1 ext3 defaults 1 2
rsync is included in most Linux distributions. You need rsync and SSH on both your backup client and server. Check to see whether rsync installed by typing rsync at the command prompt or check your list of installed packages. If you cannot find a binary distribution for your package, you can download the source code for rsync by following links on the rsync home page (see the on-line Resources).
The simplest way to run rsync over a network is as a standalone application using SSH for authentication. You can run rsync as a daemon with more features, but you won't need to in this case. I illustrate this here with a backup client named foo and a server named bar.
To replicate the directory /home on Linux machine foo with directory /data1/foo of backup server bar from client foo using rsync and SSH, type:
rsync -az /home -e ssh bob@bar:/data1/foo
You will be prompted for user bob's password, and then the foo /home directories are replicated to /data1/foo/home on bar (bob needs an account on the server and write permission for /data1/foo).
To avoid having to type bob's password each time, create a private/public key pair for SSH authentication without a password. This allows you to automate the login process.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
4 hours 27 min ago - Nice article, thanks for the
15 hours 7 min ago - I once had a better way I
20 hours 53 min ago - Not only you I too assumed
21 hours 10 min ago - another very interesting
23 hours 3 min ago - Reply to comment | Linux Journal
1 day 57 min ago - Reply to comment | Linux Journal
1 day 7 hours ago - Reply to comment | Linux Journal
1 day 8 hours ago - Favorite (and easily brute-forced) pw's
1 day 9 hours ago - Have you tried Boxen? It's a
1 day 15 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Backup Project Versus Backup Solution
The core concepts espoused here are dead on; the trouble tends to be not when you're trying to backup one or two Linux servers but when you want a backup solution for an enterprise (even a small or medium-sized one) - and you want to be able to put it out there and then forget about it. That's where the software that operates on top of the Linux and underlying hardware platform get critical.
I think these days with higher density drives I'd tend to use RAID-6 to avoid MTTDL issues potentially occurring during RAID rebuilds.
Take rsync for example - it's great. And if you have a pristine WAN out there that stays pristine in terms of quality, then you can set and forget it. But when you're using it for disaster recovery and take into account lower quality WAN then you're either going to make an investment in increasing its fault resiliency over lower quality lines or you're going to spend a lot of time handling failure.
Simple live Linux backup using SSH
I've construed a nice way to back up the stuff from my server directly into an ssh connection to my home machine. Works well, costs nothing and it does not clutter the disk space on my server. Check it out: Backing up Linux web server live via SSH.
Hey
I am using for my linux a free backup software called Dmailer http://www.dmailer.com/dmailer-backup.html . Because is a lot easier for me to save all my backup online on their servers than keeping it on other hdd or in other way , and also is more secure than in other way.
Awesome Solution
We are actually starting to use this kind of solution for some of our offsite backups and it is working great so far. I highly recommend to use Rsync in an offsite backup solution.
Backup Server
very Help full for me
help you build a home terabyte backup system
there is a free software to help you build a home terabyte backup system, though it is free ,the speed of running is very fast ,and it is reliable ,
click here ,you will find how amazing it is :www.partitionwizard.com
of interest:New ‘SEP sesam’ V 3.4 Data Recovery: SAP Certified a
This is a new Linux Back up option: 'SEP sesam' version 4.3
Salt Lake City, UT [BrainShare] - March 17, 2007 – SEP Software, the technology leader in cross-platform data backup, data restore and disaster recovery, today announced the release of ‘SEP sesam’ Version 3.4, with full certifications from SAP and VMware. SAP has certified ‘SEP sesam’ 3.4 for their enterprise level database software products, i.e. SAP R/3, SAP DB/MAXDB. This latest release features many user-friendly enhancements and important technological advancements including: Encryption; VMware and VMware VCB; Disaster Recovery down to Bare Metal; Added Groupware Modules; Disk-to-Disk-Tape: Data Storage; Off-Site Storage; Automated Data Restores; Secure Data Transfer and more. The new, highly scalable ‘SEP sesam’ V.3.4 is now available for download and will be showcased at Novell Brainshare in Salt Lake City, Utah March 16-21, 2008.
“This is a perfect fit for medium to large enterprise customers and fulfills all of their data backup, data restore and disaster recovery requirements,” explained Tim Wagner, President of SEP Software LLC. “Recent certifications from SAP AG validate ‘SEP sesam’ as a highly functional and desirable solution for the SAP user community.”
New Features and Expanded Technical Functionality Details:
- Encryption
o With data security becoming an increasingly disconcerting issue, SEP offers Blow Fish algorithm SEP and an additional AES 256-bit encryption. All data is encrypted on the client side, while keys are stored on the backup server. Only encrypted data travels across the network.
- VMware and VMware VCB
o ‘SEP sesam’ now supports VMware virtual environments at the enterprise level with the ‘SEP sesam’ ESX-client as well as VMware Virtual Consolidated Backup (VCB) in all variations.
- Disaster Recovery down to Bare Metal
o Recent enhancements to the ‘SEP sesam’ product allow restoration of data to the lowest levels possible. Bare System Recovery (‘SEP sesam’ BSR) can be integrated into centralized management backup strategies using the ‘SEP sesam’ GUI. New tasks and activities for a fast and universal Disaster Recovery on systems running Windows – including recovery to new hardware – have been integrated into the new release.
- Groupware Modules Added
o Novell Groupwise along with the Netware Client is fully-supported. SEP offers a complete backup solution for Novell OES2 environments.
o The Zarafa Module allows easier and faster backup including single mail and single item restore with the new online backup module for Zarafa Groupware Server.
- Disk-to-Disk-Tape: Data Storage, Off-Site Storage
o System Administrators can store data from Disk-to-Disk-Tape. Using the ‘SEP sesam’ virtual tape library, the transfer of data to tape or removable media is fast and easy. Customized production planning and control is now easier than ever for the effective and secure back-up of important enterprise data.
- ‘SEP sesam’ Improved Availability for SAN and NAS environments
o ‘SEP sesam’ 3.4 offers enhanced data availability, making it ideal for SAN, NAS and all other common network storage devices. The new version ensures against ANY loss of critical company data. The selected back-up strategy can be automated and controlled from either a central or remote location.
o ‘SEP sesam’ methodology is readily available for all popular operating systems, including Microsoft Windows, Unix and Linux.
- Automated Data Restores
o ‘SEP sesam’ can be set up to perform automated restores to constantly monitor data and to ensure data backup integrity assurance. Enhancements to the restore function ease the workload on overtaxed IT management organizations. The powerful and easy-to-use ‘SEP sesam’ scheduler (SEPuler) with calendaring allows for management of even the most complex data management tasks. Data files can be restored using the included Restore Wizard (for example a complete generation restore). The restore can be accomplished simply and easily.
- Secure Data Transfer
o Secure network data transfer will be accomplished through ‘SEP sesam’ SSL and SSH handling. The Firewall Port-Control protects against intrusion from outside the network and prevents unauthorized access from within. Using our Java-based GUI ‘SEP sesam’ allows remote administration from every operating system.
- ‘SEP sesam’ Online Modules
o ‘SEP sesam’ Online Modules maximize the security of all database and groupware solutions. Online Modules allow the efficient scheduling and planning of backups to take place during the production day. All ‘SEP sesam’ Online Modules are certified for Groupware (Novell Groupwise, OpenXchange, Scalix, etc.) and ERP Applications (SAP, ABAS). The ‘SEP sesam’ Online Database modules and SEP Live Recovery (Manageable Database Shadowing) allow automated data mirroring to keep the data available at all times.
Additionally, SEP has a corresponding certified release running SAP on Oracle 10g, which is now product-ready and tested for medium to very large enterprise customers. The certification includes Linux 64-Bit, Microsoft Windows 64-bit and Unix 64-bit operating systems.
Pricing and availability
‘SEP sesam’ is available for immediate download at www.sepsoftware.com. 'SEP sesam' prices start at $325 for an OES2 or Linux server and $215 for any client.
About SEP Software
SEP Software is the technology leader in cross-platform data backup, data restore and disaster recovery. Flagship product ‘‘SEP sesam’’ delivers storage management and network-wide data security software solutions for worldwide Linux, Unix and Windows systems. Based in Boulder, Colorado, SEP Software LLC is a wholly owned subsidiary of SEP AG, whose labs and headquarters are based in Weyarn, Germany. For more information, please go to www.sepsoftware.com. For inquiries please call (303) 417-6316 or mail to sales@sepsoftware.com
Excellent Write-Up
For someone who is experienced in Linux, but has never done anything serious with it, I feel that this article is most beneficial and offers a lot of promise for the average newbie. It gives the user some sense of accomplishment when done without being complex and devastatingly hard.
I plan to do this for myself, and possibly a family member's small office when I have the chance and have proven the plausibility of it.
Great, but what about backups?
I've been running a 1TB usable RAID 5 box using linux and 6 200GB drives. Having the storage space is great, and it was very cheap to build, but here's the rub: How do you inexpensively build a backup solution for 1TB of changing data? I download, and I cull, and large amounts of data flows across that array, so how do I back up 200GB/month easily and cheaply? DVD-R is not an option, building another array is not an option IMO, since I want the option of keeping old backups. Blue-Ray or tape is what I'm inclined to look at, but both add $500-600 to the cost off the top, plus media.
jungledisk
if you are running windows/linux/MacOS, then use Jungledisk with Amazon S3 for the storage. $0.15/Gb plus $5/month...inexpensive/simple and "powerful enough"
help you solve your problems which you faced
If you want to know how to back up 200GB/month easily and cheaply?
you can go here , www.partitionwizard.com
ssh problem for not skilfull linux user
Hi All:
I found the article simple and easy to follow. But I had problems to implement it on my system. First of all I am not a skilfull linux user so I asked for help and found the solution.
Problem:
In my machine and network environment, ssh authentication "as is" posted was not working.
Solution:
Check that you are using SSH 2. If your machine uses SSH 1 as default, as mine does, you should rewrite the line:
rsync -az /home -e ssh bob@bar:/data1/foo
with
rsync -az /home -e "ssh -2" bob@bar:/data1/foo
of course follow all the steps of key pair generation and so on.
hope it helps
I don't know how to do...
Hello,
I will want to adapt your tutorial to my needs.
My situation:
I have X server on linux system to backup. The backup is on a server windows solution.
How can i configure smb.conf? & What is the right way to write the line "mount -t smbfs ....."? & I am obliged to write in fstab? if yes, what do I have to write?
Thank you very much for your replies.
paco
ps:excuse me if my english is bad.
Wow
The idea about this and realization of it is very nice indeed!
Report Script, change?
The line at the end of the report script assumes you have an MTA running on the local machine, I think. I don't and I guess that is why I get "./backup_report.sh: line 16: mail: command not found
"
Is there a variation of this line that would send the generated report to my local mozilla thunderbird?
problems with Generating the Key Pair
I tried the Generating the Key Pair section on my Fedora core 4 box connecting to a slackware 10.2 box. I had no luck getting the key pair to let me connect without a password. After some research I found this how-to on it that worked better for me.
http://www.kernel-panic.org/wiki/SshKeygen
Ridgid
i had trouble with this..
i had some trouble with this..
i think the problem is:
BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/$BACKUPDIR/
should be:
$BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/$BACKUPDIR/
and
rsync $OPTS $BDIR BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/current >>
should be:
rsync $OPTS $BDIR $BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/current >>
nice, how do you do restore though?
how do you do restore though?
Another backuptool: http://www.bacula.org/
Bacula has a windows client which allows to backup data from windows clients without the need to set up a share (if i understand the docs correct).
Hope to find some time to test it out ;)
Try KeyChain instead of passphraseless SSH keys
Hi,
There is an alternative for passphraseless SSH keys that works quite well if you keep your linux-based backup client on for long times at a stretch: Keychain is a small program where you enter your SSH passphrase just once per power cycle. More info at http://www.gentoo.org/proj/en/keychain/index.xml
Power Requirements
I was just looking at starting the same project, but a modern (power saving) server will cost about 80 - $100 a year for power (considering .07 - .10 a kWh). I think the linksys NSLU2 (which is already running linux) is a much better option available for low power (1.5W instead of 60W). The only caveat; you need external USB drives to hook up to it.
Great article nonetheless, I hope that more people begin to build these so that the market place will have more (low power) options available.
Tsync is also worth to mention
Basically is a moderm rsync with many improvements. (Redundancy, peer to peer, ...). It's still in beta, but probably already more stable that rsync and other sync technologies (unison, ...).
Diverse Disks
A comment on the purchase of HDD that I didn't see in the article.
For those of you considering building a home backup server, do not go to your nearest computer store and pick up 4 of the same type of drive.
As was pointed out in a previous comment, the failure rate of IDE drives increases after ~12 months.
If you purchase a number of HDD from the same manufacturer at the same time from the same location, you run the risk of getting a similar HDD failure on all your HDD at the same time due to possible manufacturing defects.
It is recommended to either purchase similar spec'd drives from different manufacturers or if you really want the same brand of drive, ensure that they drives you purchase are from separate manufacturing runs.
The diversity of drives will significantly reduce your chances of data loss in the event that a particular manufacturer has a manufacturing defect that causes the drive to fail.
Do some google lookups for IBM Deskstore or Fujitsu HDD failures in the past few years for examples.
Anyway, happy 'backuping' ;)
Why Use Linux At All?
Why on Earth would I want to go to all this trouble when I can slap Server Elements NASLite (www.serverelements.com) onto a CD and 10 minutes later dispense with an operating system altogether?
NASLite is Linux Powered
NASlite is Linux based - 2.4.26 to be exact! I'll be inpressed if someone can do the impossible and cram a Win32 app of that capability in 4M ramdisk. Let alone boot and run from a floppy disk.
Re: Why Use Linux At All?
It's simple, here's the first clue; from the NASLite-in-a-Nutshell.pdf:
"At the DOS prompt, check to make sure that you can view the C drive."
This is a Linux group. We are Linux users. As a think-tank, we are creating our own solutions, and do not depend on commercial groups like the Microsoft Corp. to do it for us.
We share our sources and discoveries, and are not paid for it. However, our companies make more money for the effort, and as a result, we get raises.
The old Microsoft model worked like this: The programmer put a function into an exe file, and then sold the file.
The new Linux model works like this: The programmer shares his code and research; and as a result, his company profits, as do all concerned.
Can you produce the source code for this machine? I ask because there are many of us who also assemble our own hardware, and if we had the sources, then we would not have to purchase the machine at all.
The purpose of this article was to be a howto on assembling our own gear, and not to purchase Windows-based commercial stuff. I really don't understand the purpose of your question, unless it is an attempt to sell serverelement's (windows-based) equipment.
I think it's pointless here.
Michael Hearne
You didn't read very close
The NASLite in a Nutshell is an example of buying a settop box on e-bay for $32 which has a built in flash disk. The DOSlike commands you listed were being used to load a Linux boot image into that flash disk. So $32 for the machine, $25 for the software, and whatever for the USB2 hard drive and you have an NAS. A dumb one but it will work.
It is not Windows based. The source code for the FOSS portion is available.
It isn't a solution I would use, as I would prefer to have the capabilities of having it be a member of a Domain instead of just a LAN community disk server, but I see no need to get so heated up about it.
You could emulate it easily using a live CD distro, and configure it for all that function, and have it able to be a member of a Domain. But that's work, and some people prefer a $25 solution that "just works."
Jim
Exactly!
NASLite takes a minute to set up and just works. Best of all, it takes only 5 minutes to explain to a secretary how to administer it. Prior to using NASLite, RH was my choice, but when something goes wrong, I have to make a trip to correct the problem. With NASLite, a phone call usually resolves the issue.
Customer frustration is considerably lower this way.
NASLite just works.
Cheap Hardware and Low Time Investment
One can make a lot of arguments for and against Server Elements NASLite, but if you need to get a high capacity NAS server, built on low end hardware, there really are no alternatives. The damn thing runs in a mere 4M ramdisk. That exports your shares via CIFS, NFS, FTP and HTTP nicely and coherently.
I’ve been using it for 8 months on a 120MHz/64M Gateway box with 4x250G Maxtors. I had it set up and running in no time. Formatting the drives was the most time consuming task. Considering what I charge per hour for my consulting services, the $25 investment in the NASLite software was a no brainer. I’ve purchased multiple copies and installed it in many of my customer’s locations. That way I get to do the job inexpensively for them and profitably for me.
Not a bad choice if you don’t need a NAS fortress but a simple storage bin for a small office. I’d highly recommend it if you consider your time and customer’s dollars at all valuable.
No affiliation with Server Elements, just like the product enough to speak up…
Why on Earth use NASLite
Why would I pay for something like this when I can do it for free? $25 and I can only run copy? I don't think so!
Doesn't NASLite use Linux
Doesn't NASLite use Linux for OS portion?
RE: Why use Linux at all?
Perhaps because
"By design, NASLite v1.x is a community file server and does not support features such as user management , the ability to join domains or disk quotas."
rdiff-backup
I am using LVM2, SAMBA, rdiff-backup and pyBackpack to setup my home backup system. It works quite well.Here are the details.
When building a server with
When building a server with many disks keep in mind, that hard drives use mostly 12V, but most psu-s have limited power on 12V. They give plenty of 5V or 3,3V, but not 12V. You should make certain, that the seek power consuption of all HDD is less than the psu@12V. Do not be fooled by "480W", it might not be enough for 8 hdd-s!
I highly recommend NOT to make a backup server on nonredundant IDE disks, especially if you use raid0. If you lose one disk you lose all the data! IDE hdd reliability starts to decline after ~12 months.
Also, consider the power consumption. 100+W of constant usage adds up.
failure rates for drives
I've seen the "~12 months" show up twice now in this thread.
Most hard drives fail according in a "bathtub curve" pattern. Meaning, you will get a few that fail early in the lifecycle, then very few failures until the end of the lifecycle. The ones that failed early probably had manufacturing defects.
This is why you want to mix/match your drives in a RAID set from different manufacturing batches. So that a process glitch in the factory only affects one of the drives in the array. If all of your drives came from the same faulty factory line, they might all fail within the same timeframe. If this timeframe happens to be shorter then the recovery period for the RAID then you will lose everything on the RAID.
Using hot-spare drives shortens the recovery period (the RAID array can immediately start the rebuild as soon as a drive fails). That gives you good odds of getting redundancy again before a 2nd drive fails.
IDE disk statistics?
Does IDE have spare sectors like SCSI? Does it allow reporting of disk problem statistics and remaining spare sectors?
Also, do IDE disks still not support simultaneous data requests or has a work-around been made? At the price point it seems IDE is the choice for data backups.
Thanks!
EVMS and BackupPC
BackupPC is a preferable choice to rsync if (a) you are backing up Windows boxes, or (b) you want non-expert users to be able to restore their own files. BackupPC's HTTP interface is very nice for non-experts.
EVMS is probably preferable to LVM + RAID. It gives the same capabilities and even more configuration flexibility. I say "probably" because I built my backup server before EVMS was available.
BackupPC not so great on Windows
Backup solutions like this work great on any other OS but Windows.
A list of limitations: http://backuppc.sourceforge.net/faq/limitations.html
These limitations are Windows limitations or similar and not really limitations of BackupPC.
I really recommend BackupPC is is great, just keep these things in mind when you consider using it with Windows.
no it isnt
backup pc will do linux too. not just windoze
why not LVM or even raid
with so many disks in the machine you should use LVM or at least raid0 to make them look like one big uniform drive. Newer versions of RHEL or fedora give you the option to set these up if you manually partion with diskdruid.
With LVM you can even add disks to the LVM after the system is operational and increase the file system size. If you are worried about failing disks destroying all your data set up raid with hot spares or use raid5. As the article said, disks are cheap.
Never use raid0. Ever.
Never use raid0. Ever.
RAID 0 is very useful
If you are doing something where you want the fastest read write performance, with data that you can recover from another source if need be.
Let's say you DV video from your video camera to your RAID 0 partition. This video is still available from it's original source. You edit the video and then when you are done you compress the video and save it to a RAID 5 drive with a hot spare.
I am doing this with a set of five SCSI drives, all 18GB each. This gives me 90GB of the fastest available hard drive to play with. The read write speed is no longer a bottle neck. :D
Not true.
RAID-0 is often used in combination with other RAID configurations (mainly when hardware RAID is involved) to gain high performance while minimizing risk. I've configured RAID "enclosures" with multiple disk "trays" that are configured as RAID-5 w/hot-spare, with multiple trays spanning multiple HBAs, which have in turn been striped RAID-0. In order for a failure with data loss you would have to have 3 failures in one tray or more than 2 failures in multiple trays. Please don't say never when clearly the risks can be mitigated. RAID-5 does not replace proper backups either.
Better-er is rsync's --link-dest option
You can dramatically improve the backup system using rsync's --link-dest option. Done right this will get you daily snapshots, like a full backup, but using only the disk space of an incremental backup.
(There are some issues with meta-information, file permissions and the like. mtree can be used to handle this.)
Rather than rolling your own, it could be better to use something like dirvish, which uses the --link-dest technique. http://www.dirvish.org/
A nice backup software
A nice backup software backup to try is BackupPC. http://backuppc.sf.net/
Making this work with Win/Mac client installs
Thank you for this. Always knew those storage boxes were obcenely priced. A question: Have you tried this with Vembu StoreGrid (see www.vembu.com) - it seems to support all OS and uses rsync/zlib etc with a friendly UI. How would this work with this Linux TB box - if I'm lazy and don't wanna go through implementing rsync etc manually? Esp since I'm planning a 'box' for backing up Win/Mac clients and another smaller FreeeBSD server?
Go for it!
The setup presented is a completely generic one. I tried to show the most vanilla implementation of a terabyte-capacity storage system. The tools I presented are merely the most rudimentary utilities that come standard with virtually every Linux installation. If you have other tools that will run on Linux, then there is little to stop you implementing them as well. Have fun!
Go for it!
Thank you for the article! As a Linux newbie, I learned a lot from this article. Although simplistic, I now have a backup server using linux.
Thanks again!