# Build a Home Terabyte Backup System Using Linux

Build a low-cost, terabyte-sized backup server using Linux and back up your digital audio files, digital images and digital movie recordings.

You can run a script on foo to replicate foo on bar using bob's account on bar. You should read the documentation for rsync, which has numerous features (more than 70 command-line options). In particular, the -delete option can have disastrous consequences if misused. Listing 1 shows a seven-day incremental backup. Files altered or deleted on each day of the week are deposited in directories named for the day (set by -backup-dir). The most recent backup is stored in the directory current.

If you prefer a compressed archive format, you still can run tar for a full backup over the network:

tar cvfz - /home | ssh bob@bar dd of=/data1/foo/current.tar.gz


and use the -newer option for an incremental tar backup.

rsync is more efficient than the tar command, because rsync copies only the differences between the current and previous copy of the data.

You can get by with rsync and SSH on most platforms (including MS Windows), but in reality, a fileserver setup is preferable, especially if you are running MS Windows clients. For MS Windows machines, a Windows backup application is preferable. The easiest way to do this is to run the backup to write to a share on the Samba server.

Software Configuration—Samba

If your Linux installation supports SMB file sharing, Samba is probably installed. If not, binaries are included with virtually all distributions. If this isn't the case with your distribution, or if you prefer to use the very latest Samba version, download the source code and compile and install. Official Samba distributions are available from the Samba home page (see Resources). Refer to the documentation there for installing and initially configuring Samba.

Once your backup server has Samba server installed, all Samba configurations are made by editing the smb.conf file, which is usually in /etc/samba/smb.conf or /usr/local/samba/lib/smb.conf. Graphical configuration utilities like SWAT usually are included with Samba. See your documentation for information about starting or stopping Samba. You should configure your server to ensure that Samba starts when the server initially boots up.

Following our backup example above, on server bar, set up a simple smb.conf file or try appending the section below to the existing smb.conf file to define a share called bob:

[bob]
comment = foo backup account
path = /data1/foo
valid users = bob
public = no
writable = yes


Next, add bob with any secure password as a Samba user (bob must have a Linux account as well as permission to read/write the /data1/foo directory):

smbpasswd -a bob


For MS Windows clients, map the share \\bar\bob as a network drive in MS Windows using the user name bob and the SMB password for the bob Samba account. You then should be able to run backups to the mapped network drive. I typically use the free ntbackup software and set it up to write .bkf files to network storage. ntbackup comes free with Windows 2000 and XP and can run automated, regularly scheduled backups from the Windows client. Windows client-based backups have the advantage of backing up the entire state of the system (including the Windows registry).

You also can use Samba to serve files to most UNIX or Mac OS X clients. The smb client is installed by default in Mac OS X. In Linux distributions, make sure that the smb client package is installed. The smb share should be mounted onto the /backup mountpoint of machine foo:

mount -t smbfs -o username=bob,password=somepassword //bar/foo /backup


To have the backup drive mount when the system boots, place a line such as the following in /etc/fstab:

//bar/data1/foo            /backup                smbfs  rw,username=bob,password=somepassword  0 0


### Backup Project Versus Backup Solution

The core concepts espoused here are dead on; the trouble tends to be not when you're trying to backup one or two Linux servers but when you want a backup solution for an enterprise (even a small or medium-sized one) - and you want to be able to put it out there and then forget about it. That's where the software that operates on top of the Linux and underlying hardware platform get critical.

I think these days with higher density drives I'd tend to use RAID-6 to avoid MTTDL issues potentially occurring during RAID rebuilds.

Take rsync for example - it's great. And if you have a pristine WAN out there that stays pristine in terms of quality, then you can set and forget it. But when you're using it for disaster recovery and take into account lower quality WAN then you're either going to make an investment in increasing its fault resiliency over lower quality lines or you're going to spend a lot of time handling failure.

### Simple live Linux backup using SSH

I've construed a nice way to back up the stuff from my server directly into an ssh connection to my home machine. Works well, costs nothing and it does not clutter the disk space on my server. Check it out: Backing up Linux web server live via SSH.

### Hey

I am using for my linux a free backup software called Dmailer http://www.dmailer.com/dmailer-backup.html . Because is a lot easier for me to save all my backup online on their servers than keeping it on other hdd or in other way , and also is more secure than in other way.

### Awesome Solution

We are actually starting to use this kind of solution for some of our offsite backups and it is working great so far. I highly recommend to use Rsync in an offsite backup solution.

### Backup Server

very Help full for me

there is a free software to help you build a home terabyte backup system, though it is free ,the speed of running is very fast ,and it is reliable ,

### Excellent Write-Up

For someone who is experienced in Linux, but has never done anything serious with it, I feel that this article is most beneficial and offers a lot of promise for the average newbie. It gives the user some sense of accomplishment when done without being complex and devastatingly hard.

I plan to do this for myself, and possibly a family member's small office when I have the chance and have proven the plausibility of it.

I've been running a 1TB usable RAID 5 box using linux and 6 200GB drives. Having the storage space is great, and it was very cheap to build, but here's the rub: How do you inexpensively build a backup solution for 1TB of changing data? I download, and I cull, and large amounts of data flows across that array, so how do I back up 200GB/month easily and cheaply? DVD-R is not an option, building another array is not an option IMO, since I want the option of keeping old backups. Blue-Ray or tape is what I'm inclined to look at, but both add $500-600 to the cost off the top, plus media. ### jungledisk if you are running windows/linux/MacOS, then use Jungledisk with Amazon S3 for the storage.$0.15/Gb plus $5/month...inexpensive/simple and "powerful enough" ### help you solve your problems which you faced If you want to know how to back up 200GB/month easily and cheaply? you can go here , www.partitionwizard.com ### ssh problem for not skilfull linux user Hi All: I found the article simple and easy to follow. But I had problems to implement it on my system. First of all I am not a skilfull linux user so I asked for help and found the solution. Problem: In my machine and network environment, ssh authentication "as is" posted was not working. Solution: Check that you are using SSH 2. If your machine uses SSH 1 as default, as mine does, you should rewrite the line: rsync -az /home -e ssh bob@bar:/data1/foo with rsync -az /home -e "ssh -2" bob@bar:/data1/foo of course follow all the steps of key pair generation and so on. hope it helps ### I don't know how to do... Hello, I will want to adapt your tutorial to my needs. My situation: I have X server on linux system to backup. The backup is on a server windows solution. How can i configure smb.conf? & What is the right way to write the line "mount -t smbfs ....."? & I am obliged to write in fstab? if yes, what do I have to write? Thank you very much for your replies. paco ps:excuse me if my english is bad. ### Wow The idea about this and realization of it is very nice indeed! ### Report Script, change? The line at the end of the report script assumes you have an MTA running on the local machine, I think. I don't and I guess that is why I get "./backup_report.sh: line 16: mail: command not found " Is there a variation of this line that would send the generated report to my local mozilla thunderbird? ### problems with Generating the Key Pair I tried the Generating the Key Pair section on my Fedora core 4 box connecting to a slackware 10.2 box. I had no luck getting the key pair to let me connect without a password. After some research I found this how-to on it that worked better for me. http://www.kernel-panic.org/wiki/SshKeygen Ridgid ### i had trouble with this.. i had some trouble with this.. i think the problem is: BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/$BACKUPDIR/

should be:

$BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/$BACKUPDIR/

and

rsync $OPTS$BDIR BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/current >>

should be:

rsync $OPTS$BDIR $BACKUP_LOGIN@$BSERVER:$BACKUP_HOME/current >> ### nice, how do you do restore though? how do you do restore though? ### Another backuptool: http://www.bacula.org/ Bacula has a windows client which allows to backup data from windows clients without the need to set up a share (if i understand the docs correct). Hope to find some time to test it out ;) ### Try KeyChain instead of passphraseless SSH keys Hi, There is an alternative for passphraseless SSH keys that works quite well if you keep your linux-based backup client on for long times at a stretch: Keychain is a small program where you enter your SSH passphrase just once per power cycle. More info at http://www.gentoo.org/proj/en/keychain/index.xml ### Power Requirements I was just looking at starting the same project, but a modern (power saving) server will cost about 80 -$100 a year for power (considering .07 - .10 a kWh). I think the linksys NSLU2 (which is already running linux) is a much better option available for low power (1.5W instead of 60W). The only caveat; you need external USB drives to hook up to it.

Great article nonetheless, I hope that more people begin to build these so that the market place will have more (low power) options available.

### Tsync is also worth to mention

Basically is a moderm rsync with many improvements. (Redundancy, peer to peer, ...). It's still in beta, but probably already more stable that rsync and other sync technologies (unison, ...).

### Diverse Disks

A comment on the purchase of HDD that I didn't see in the article.
For those of you considering building a home backup server, do not go to your nearest computer store and pick up 4 of the same type of drive.
As was pointed out in a previous comment, the failure rate of IDE drives increases after ~12 months.
If you purchase a number of HDD from the same manufacturer at the same time from the same location, you run the risk of getting a similar HDD failure on all your HDD at the same time due to possible manufacturing defects.
It is recommended to either purchase similar spec'd drives from different manufacturers or if you really want the same brand of drive, ensure that they drives you purchase are from separate manufacturing runs.
The diversity of drives will significantly reduce your chances of data loss in the event that a particular manufacturer has a manufacturing defect that causes the drive to fail.
Do some google lookups for IBM Deskstore or Fujitsu HDD failures in the past few years for examples.

Anyway, happy 'backuping' ;)

### Why Use Linux At All?

Why on Earth would I want to go to all this trouble when I can slap Server Elements NASLite (www.serverelements.com) onto a CD and 10 minutes later dispense with an operating system altogether?

### NASLite is Linux Powered

NASlite is Linux based - 2.4.26 to be exact! I'll be inpressed if someone can do the impossible and cram a Win32 app of that capability in 4M ramdisk. Let alone boot and run from a floppy disk.

### Re: Why Use Linux At All?

It's simple, here's the first clue; from the NASLite-in-a-Nutshell.pdf:

"At the DOS prompt, check to make sure that you can view the C drive."

This is a Linux group. We are Linux users. As a think-tank, we are creating our own solutions, and do not depend on commercial groups like the Microsoft Corp. to do it for us.

We share our sources and discoveries, and are not paid for it. However, our companies make more money for the effort, and as a result, we get raises.

The old Microsoft model worked like this: The programmer put a function into an exe file, and then sold the file.

The new Linux model works like this: The programmer shares his code and research; and as a result, his company profits, as do all concerned.

Can you produce the source code for this machine? I ask because there are many of us who also assemble our own hardware, and if we had the sources, then we would not have to purchase the machine at all.

The purpose of this article was to be a howto on assembling our own gear, and not to purchase Windows-based commercial stuff. I really don't understand the purpose of your question, unless it is an attempt to sell serverelement's (windows-based) equipment.

I think it's pointless here.

Michael Hearne

### You didn't read very close

The NASLite in a Nutshell is an example of buying a settop box on e-bay for $32 which has a built in flash disk. The DOSlike commands you listed were being used to load a Linux boot image into that flash disk. So$32 for the machine, $25 for the software, and whatever for the USB2 hard drive and you have an NAS. A dumb one but it will work. It is not Windows based. The source code for the FOSS portion is available. It isn't a solution I would use, as I would prefer to have the capabilities of having it be a member of a Domain instead of just a LAN community disk server, but I see no need to get so heated up about it. You could emulate it easily using a live CD distro, and configure it for all that function, and have it able to be a member of a Domain. But that's work, and some people prefer a$25 solution that "just works."

Jim

### Exactly!

NASLite takes a minute to set up and just works. Best of all, it takes only 5 minutes to explain to a secretary how to administer it. Prior to using NASLite, RH was my choice, but when something goes wrong, I have to make a trip to correct the problem. With NASLite, a phone call usually resolves the issue.

Customer frustration is considerably lower this way.

NASLite just works.

### Cheap Hardware and Low Time Investment

One can make a lot of arguments for and against Server Elements NASLite, but if you need to get a high capacity NAS server, built on low end hardware, there really are no alternatives. The damn thing runs in a mere 4M ramdisk. That exports your shares via CIFS, NFS, FTP and HTTP nicely and coherently.

### Doesn't NASLite use Linux

Doesn't NASLite use Linux for OS portion?

### RE: Why use Linux at all?

Perhaps because

"By design, NASLite v1.x is a community file server and does not support features such as user management , the ability to join domains or disk quotas."

### rdiff-backup

I am using LVM2, SAMBA, rdiff-backup and pyBackpack to setup my home backup system. It works quite well.Here are the details.

### When building a server with

When building a server with many disks keep in mind, that hard drives use mostly 12V, but most psu-s have limited power on 12V. They give plenty of 5V or 3,3V, but not 12V. You should make certain, that the seek power consuption of all HDD is less than the psu@12V. Do not be fooled by "480W", it might not be enough for 8 hdd-s!

I highly recommend NOT to make a backup server on nonredundant IDE disks, especially if you use raid0. If you lose one disk you lose all the data! IDE hdd reliability starts to decline after ~12 months.

Also, consider the power consumption. 100+W of constant usage adds up.

### failure rates for drives

I've seen the "~12 months" show up twice now in this thread.

Most hard drives fail according in a "bathtub curve" pattern. Meaning, you will get a few that fail early in the lifecycle, then very few failures until the end of the lifecycle. The ones that failed early probably had manufacturing defects.

This is why you want to mix/match your drives in a RAID set from different manufacturing batches. So that a process glitch in the factory only affects one of the drives in the array. If all of your drives came from the same faulty factory line, they might all fail within the same timeframe. If this timeframe happens to be shorter then the recovery period for the RAID then you will lose everything on the RAID.

Using hot-spare drives shortens the recovery period (the RAID array can immediately start the rebuild as soon as a drive fails). That gives you good odds of getting redundancy again before a 2nd drive fails.

### IDE disk statistics?

Does IDE have spare sectors like SCSI? Does it allow reporting of disk problem statistics and remaining spare sectors?

Also, do IDE disks still not support simultaneous data requests or has a work-around been made? At the price point it seems IDE is the choice for data backups.

Thanks!

### EVMS and BackupPC

BackupPC is a preferable choice to rsync if (a) you are backing up Windows boxes, or (b) you want non-expert users to be able to restore their own files. BackupPC's HTTP interface is very nice for non-experts.

EVMS is probably preferable to LVM + RAID. It gives the same capabilities and even more configuration flexibility. I say "probably" because I built my backup server before EVMS was available.

### BackupPC not so great on Windows

Backup solutions like this work great on any other OS but Windows.

A list of limitations: http://backuppc.sourceforge.net/faq/limitations.html

These limitations are Windows limitations or similar and not really limitations of BackupPC.

I really recommend BackupPC is is great, just keep these things in mind when you consider using it with Windows.

### no it isnt

backup pc will do linux too. not just windoze

### why not LVM or even raid

with so many disks in the machine you should use LVM or at least raid0 to make them look like one big uniform drive. Newer versions of RHEL or fedora give you the option to set these up if you manually partion with diskdruid.

With LVM you can even add disks to the LVM after the system is operational and increase the file system size. If you are worried about failing disks destroying all your data set up raid with hot spares or use raid5. As the article said, disks are cheap.

### Never use raid0. Ever.

Never use raid0. Ever.

### RAID 0 is very useful

If you are doing something where you want the fastest read write performance, with data that you can recover from another source if need be.

Let's say you DV video from your video camera to your RAID 0 partition. This video is still available from it's original source. You edit the video and then when you are done you compress the video and save it to a RAID 5 drive with a hot spare.

I am doing this with a set of five SCSI drives, all 18GB each. This gives me 90GB of the fastest available hard drive to play with. The read write speed is no longer a bottle neck. :D

### Not true.

RAID-0 is often used in combination with other RAID configurations (mainly when hardware RAID is involved) to gain high performance while minimizing risk. I've configured RAID "enclosures" with multiple disk "trays" that are configured as RAID-5 w/hot-spare, with multiple trays spanning multiple HBAs, which have in turn been striped RAID-0. In order for a failure with data loss you would have to have 3 failures in one tray or more than 2 failures in multiple trays. Please don't say never when clearly the risks can be mitigated. RAID-5 does not replace proper backups either.

### Better-er is rsync's --link-dest option

You can dramatically improve the backup system using rsync's --link-dest option. Done right this will get you daily snapshots, like a full backup, but using only the disk space of an incremental backup.

(There are some issues with meta-information, file permissions and the like. mtree can be used to handle this.)

Rather than rolling your own, it could be better to use something like dirvish, which uses the --link-dest technique. http://www.dirvish.org/

### A nice backup software

A nice backup software backup to try is BackupPC. http://backuppc.sf.net/

### Making this work with Win/Mac client installs

Thank you for this. Always knew those storage boxes were obcenely priced. A question: Have you tried this with Vembu StoreGrid (see www.vembu.com) - it seems to support all OS and uses rsync/zlib etc with a friendly UI. How would this work with this Linux TB box - if I'm lazy and don't wanna go through implementing rsync etc manually? Esp since I'm planning a 'box' for backing up Win/Mac clients and another smaller FreeeBSD server?

### Go for it!

The setup presented is a completely generic one. I tried to show the most vanilla implementation of a terabyte-capacity storage system. The tools I presented are merely the most rudimentary utilities that come standard with virtually every Linux installation. If you have other tools that will run on Linux, then there is little to stop you implementing them as well. Have fun!

### Go for it!

Thank you for the article! As a Linux newbie, I learned a lot from this article. Although simplistic, I now have a backup server using linux.

Thanks again!

