LVM and Removable IDE Drives Backup System

Backing up data is a problem we all face, and as the size and complexity of the systems for which we are responsible grow, so do our backup problems.
______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Step up quality of the EMP discussion

Anonymous's picture

The reckless way that folks in this forum seem to disregard the seriousness of the EMP threat is akin to the reckless way that people for decades regarded the hurricane threat for New Orleans. May I remind everyone that our country published the Report of the Commission to Assess the Threat to the Unites Sates from EMP Attack, and that this report is about 200 pages? This matter is not for ostriches, it is for those are courageous enough to face reality.

The fact of the matter is that unless responsible data center managers take prudent steps, your technology is toast. It would only take a single nuke launched from the Gulf of Mexico or the Atlantic, exploded somewhere above the heartland. Short of a nuke, a Skud type missile launched from a commercial freighter would cause major Eastern seaboard destruction. Read the report from the Sage Policy Group which describes this threat. Short of malicious EMP attack, solar storms could destroy a major part of the grid including the large scale transformers. Our country no longer manufactures large scale transformers.

Let's please stop the jokes, or move this discussion to a more intelligent forum.

Larger LVM Backup

Anonymous's picture

Great Article.

I have not yet used LVM although I intend to soon and have given backups a lot of thought. My LVM set up will be fairly large, consisting of several physical drives in the machine - lets say 5, and several sets of removable drives for backup, each of which would be 5 similar drives. I would back up from a snapshot to one of my backup drive sets. Is it necessary to mount all 5 backup drives at the same time to do the backup? What I would like is to have one removeable drive bay, and to be able to plug in each of the 5 backup drives one after the other from one backup set and copy the data there. Even better would be if the data on the backup drives was already useable so that it could be just updated.

If I had a non-LVM system, I could have 5 seperate partitions and do the backup very easily in this way with rsync, copying only the changed data, so doing an incremental backup while keeping a full backup of each of the 5 partitions available. Obviously this doesnt have any of the advantages of LVM, like combining the drives into one logical volume, or being able to add more drives easily.

If Ive made the decision to use LVM does this automatically lead to me having 5 monted drives and 5 removable drive bays, 10 drives in all or is there another way?

MarkW.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Nice, but nothing beats a proper backup software solution such as Amanda or Arkeia, and backup to disk instead of tape. Its the media which is costly, not the software.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

As far as Arkeia goes, it is the software that is costly, not the hardware. Try getting pricing for corporate use - single machine licenses cost ~$600 USD. That can buy a second PC!

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Good article to get some people thinking about backups. Just like most everything in the opensource (and real) world we put ourselves in, there is more than one "right" way to do any of the things we do. Some have mentioned logical ways to enhance what you have done (eg rsync to increase efficiency). I would put looking into a NBD (network block device) high on your research list. Redundant mirrored pairs are alos a good idea. Others have mentioned packages to others have spent much of their time creating for our collective good. I would like to mention rdiff-backup as one of those great options. It's availible at http://rdiff-backup.stanford.edu/.

I do have to agree with greatly with one of your statements: ..."I do things that doubtless make more experienced people shudder in horror."... Indeed, indeed (lol). First and formost, you have zero error checking or return code status checks for any part of the scripts (let alone the critical sections). Generally, you don't write directly to your own mail file (a log file is understandable)...that's kinda wierd.

Realy, in the end, it seems to work for you and that makes it great. Again, good article.

How to advertise hard disk size.

Anonymous's picture

113MB ==> 120M

First start with a 113Mb harddrive. That will hold:
113*1024*1024 = 118,489,088 bytes.
118,489,088 / 1,000,000 = 118M (maybe close enough for advertising purposes to 120M)

Starting with 120M advertised size:
((120,000,000/1,024)/1024) = 114.44Mb of data.

So maybe try 114M in the future.

Re: How to advertise hard disk size.

Anonymous's picture

Wow, 120MB HDD's in 2004?

112GB * 1024 * 1024 * 1024 = 120,259,084,288 B

Which is typically advertised as 120GB.

The difference between a ok admin and a great one

Anonymous's picture

One word for you my friend "snapshot" , if you do not know what it is look it up.

Re: The difference between a ok admin and a great one

Anonymous's picture

Wheee! Linux Server Hacks (O'Reilly) tells you how to create a snapshot.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Once upon a time, Sun had a file system known as TFS, the "Transparent File System". TFS was designed to "overlay" a real file system, and store only changes made to the original files. So, say you have a read-only media. You could mount an empty hard disk over the read-only media using TFS. The files would allow write access(!), but saves would be written back to TFS (since the underlying media was read-only anyway).

In the classic context, the read-only system was NTFS. In a more modern context, the read-only system would be DVD or CD.

I sought far and wide for "TFS for Linux" to no avail for exactly this purpose: archival of low usage files to optical media while still enabling changes. Any info on such a technique would be appreciated.

too late but anyway: plasticFS by Peter miller

Anonymous's picture

do the googleing yourself. Oh wait here it is already.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

I sought far and wide for "TFS for Linux" to no avail

Obviously not far and wide enough

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

for what period of time are the backups maintained?
what is the total number of backup ide drives required?
do you have an estimate of how many GB the 200GB would bzip2 into - are they ASCII/binary?
how long a time does a backup run require?

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Once upon a time, Sun had a file system known as TFS, the "Transparent File System". TFS was designed to "overlay" a real file system, and store only changes made to the original files. So, say you have a read-only media. You could mount an empty hard disk over the read-only media using TFS. The files would allow write access(!), but saves would be written back to TFS (since the underlying media was read-only anyway).

In the classic context, the read-only system was NTFS. In a more modern context, the read-only system would be DVD or CD.

I sought far and wide for "TFS for Linux" to no avail for exactly this purpose: archival of low usage files to optical media while still enabling changes. Any info on such a technique would be appreciated.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Using LVM to get around the nastyness of tapes is a nice idea, have not seen that before.

Have been using 2 approaches over the last 2 years at a small business; fortunately the data sizes are not so big - we only need to keep about 10gig under control.

The first one is a thing I call TIABS (Transparent Incremental Archiver Backup System or TIABS Is Another Backup Scheme) which is an unholy collection of shell scripts.

The 10gig workspace is a samba share on the same box as TIABS. This space has about 150meg churn a day. The hardware is an old Soyo / Intel BX / Celeron box running Mandrake 8.1; this gives us zero problems (but we have a hardware twin just in case).

TIABS works using incrementals against a master image of the share, which I recreate manually every few months. Essentially overnight TIABS does a cp -au type thing to identify changes, populates a blank folder with these then goes back and makes _many_ links back to previous day's image for the unchanged files.

That means the archive area has N folders, one per day, each apparently with a "complete image" of the samba share workspace being backed-up. All browsable (read-only) across the share; that's nice as it allows (for example) our web designer to go look at her website from 6 months ago and run it live right out of the archive area. Everyone can see the archive using Explorer on their own PC; very little intervention from me (though I do check it's pulse etc regularly) and best of all - no evil tapes :))

The biggest problems came from dealing with MS filenames which allow all sorts of embedded $, multiple spaces etc. I could not find *nix tools which would not trip up somewhere (cp, cpio, tar, rsync - my mind just glazes over thinking about the problems). All the incremental detection finally had to be done by script (ouch). It got ugly.

The second approach is using a rotation of big disks. The workspace plus every partition on each PC gets imaged across the local network twice monthly (samba mounts + tar etc); these are just compressed images of complete partitions; very handy if a complete rebuild (=reversion to a working PC) is needed.

TIABS itself uses ext3 and sits on it's own disk perminantly in the server; the disk is an 60gig unit split into 4 partitions. This holds about 10 months worth of daily images.

I'll just mention that we are in deep coutryside with pathetic overhead powerlines - we get 2 or 3 outages a month. Thank goodness for ext3! Floods? Should I mention the river? and there's also.... :)

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

And your offsite backups??? You would be severely screwed w/out them if the office were to catch fire.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

ahh, there is a fire safe in another building (200 yards away) where the rotation big-disks are kept.

Another option: external USB2 disks

Anonymous's picture

Another similar backup option which I'm about to try is to use external USB2 (or FireWire) hard disks. They're hot pluggable, not too expensive, and you can get small portable ones if you want to take them off-site. Data throughput may not be quite as high as IDE (especially with a 2.4 kernel - USB2 support appears to be much faster in 2.6), but that may not matter if you're happy to leave backups chugging away overnight. Presumably they could be combined using LVM or RAID in just the same way as described in the article?

Re: Another option: external USB2 disks

Anonymous's picture

I use a usb-ide enclosure for one of those cheap ide bays. In windows, i can simply "eject" the hard drive bay device, replace the hard drive, and then plug the drive back in without rebooting. My guess is that you could do the same thing in linux by re-loading the usb mass storage driver. It wouldn't cost very much to do, also it would save the trouble of rebooting

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

A nice article indeed. Rsync has already been mentioned, so I don't need to do that again.
There is one thing that scares me a little: what happens if you find out a file has been corrupted two weeks ago? You will have no backups before that do you? Would a setup with CVS or a more modern version control system not be a better solution then?

Another thing: RH 8 is quite old. In fact it has already been EOL-ed by redhat. It uses LVM 1, which uses a dedicated kernel module. In 2.6 kernels, this has been replaced by LVM 2 and a smaller more generic solution in the form of the device mapper module.

Re: LVM and Removable IDE Drives Backup System

dx100's picture

We have been using a similar approach for several years. But there are still a few remaining issues unresolved.

1. We give each user a removable IDE to backup their own data. But the removable IDE is not Hot Swappable. The BIOS has to find out that there is a disk exists in the IDE channel, and pass such information to the Linux boot procedure, which, in turn, would enable the Linux kernel to have access to the disk. This means that EVERYTIME a user has to reboot the workstation he/she is using. Over the years, we found such approach is not very user friendly. Sometimes there are other people using that workstation via ssh or other remote login process. Sometime other user left a heavy number crunching process to run on the workstation and it is not ethical to reboot the workstation. Furthermore, users sometime accidentally switch off the removable disk (the removable caddy has a power switch) before a complete shutdown of the machine and hence resulting in data corruption (on disks using ext2 system). As a result, our users do not want to use the removable disk as a means of backup.
2. At the institution level, we are providing a monthly full backup and daily incremental backup to ensure that a user can retrieve his/her data of any version over a period of a month. Four year ago we built a backup server using 8 80GB IDE disks (the largest at that time) in RAID 5 mode (a net capacity about 540 GB). An open source package (afbackup http://sourceforge.net/projects/afbackup) is configured to implement our backup policy to backup data from 2 NFS servers over the LAN. The backup process has been running trouble free for 4 years (great Linux system

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Hi.

1.5 GB should be possible with a 12-port 3ware SATA-controller (3Ware Escalade 8506-12MI) and some large disks.

Reading the story, my first concern is the JBOD-like structuring of the drives, if 1 (one) drive fails the backup is toast.

I have implementet something similar on a 250Gb (net) scale for a customer. I used a h/w (p-)ata raid controller and running the drives in RAID5 (ok, this does cost approx 50% more gross-space) and a very fault-tolerant file system (reiserfs) as e2fs in my opionion is too fragile for backup purposes.

I second the rsync comment above, I implemented it with hourly versioning so that there is always data for each hour the last day, then each day the last week and each week for the last two months and each month one year back.

This huge number of backup takes up approx. 130% of the data, but that is highly dependant on your data's update frequency.

After all for most companies their data is their business today, so the good old "better safe than sorry" applies more than ever.

The system has been running automated without human intervention for more than one year now and the client is happy.

Svenne (svenne(at)kracon.dk)

Why not have a look at Bacula?

Anonymous's picture

I recently implemented Bacula for our backup needs. We don't have such a high volume to backup (only 15 GB), but I'm sure Bacula can manage volumes of 200 GB.
The added advantage is that a catalog is maintained in an SQL database, so finding a version backed up on a particular date is very easy.It can back up to files or to tape. We chose to backup to removable USB2.0 storage (to be carried off site) and to DVD+R (smaller set of files (compressed) for historical backup).
Bacula can backup files from Linux, BSD, Irix and Windows clients over the network.
In your case I would create a RAID5 array with 7 or 8 200 GB disks for the daily backups and for holding the other ones before writing them to external media. Daily backups don't need more maintenance than checking the logs to see whether everything went fine. On a weekly basis the backup made on Friday can be copied to removable storage. (If you want, incrementals or differentials against these weekly backups can also be made parallel to the daily full backups).
That leaves the historical backup for which you could use removable storage or fixed IDE disks, as you do now.
Bacula takes care of maintaining the database.
One thing that is not possible, is to access your files directly without restoring them first. This means you can't accidentally modify them either though, while you are busy, trying to find the correct file.

Of course, if you are happy with your current solution, don't change a winning horse. I'm just trying to describe an alternative. One with which I'm very pleased and that I think the world should know about.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Why you are using 'cp' ? why not 'rsync' ? With rsync you will save time.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

A very useful tool is rdiff-backup which will do much the same
as rsync only with increments (ie. it will allow you to maintain
a mirror of the last backup and then increments for the last 7 days
or whatever).

A nice touch is to NFS mount it read only on workstations so that
users can restore their own backups when they accidentally
delete a file etc.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Great article. This concept could be combined with the rsync backup method discussed in the latest Linux Focus: http://www.linuxfocus.org/English/March2004/article326.shtml

Re: LVM and Removable IDE Drives Backup System

Stomil's picture

I have a similar solution here, but we took another approach: 5 120GB drives in a RAID5 array (software Linux kernel driver) on a dedicated machine used as a 'data dump'. We also have a tape library (a big one) but due to the tapes' failure rate the thing is next to useless - sometimes takes days of retries to read a tape written a year of two ago (we store meteorological simulations data there). And the tape library absolutely sucks at file management - files aren't erased when deleted from the library virtual filesystem, library admin needs to trigger a 'compaction' process to actually erace the data. So think what happens at any approach to manage differential backups.
OTOH the RAID array, cheap and not very fast is a good medium to do differential backups and store data that may beome useless (and should be deleted) after a short period of time.

Re: LVM and Removable IDE Drives Backup System

Anonymous's picture

Hmmm...

Wouldn't it be easier to create a 2 or 3 submirror mirror set with software raid, and just pull one submirror each night?

Drives are (relatively) cheap, and you wouldn't have to go through all the time waiting for the copy to take place.

Also, once a 'new' set of drives was inserted after the old ones were pulled, the system should 'sync them up' all by itself...

Not only is it much faster; but you also have multiple layers of protection against a disk failure while the system is up and in use.

EMP Attacks kill HDs/Tapes

Anonymous's picture

An EMP attack would ruin all that backupped data on a harddrive or tape. So that's why I'd buy DVDs for backing up data. It's optical and should be more resistent to EMP attacks. Which I think will happen someday.

EMP Attack?

Anonymous's picture

I think if we ever experienced an EMP attack it would most likely be followed by some other sort of attack. Like by guys with guns and bombs and s*!t! I don't know about anyone else but, if that happens, the VERY LAST thing on my mind will be the safty on my safety of my backups.

Where do you live? Like Iran or something?

Re: EMP Attacks kill people too

Anonymous's picture

Have you an EMP-safe people-backup-and-restore system? Without it, your point is ... pointless: who would be there to :
a) still suffer from not being able to restore
b) try to do it anyhow, if the media are there

BTW: those "EMP attacks" are often mentioned together with nuclear war. Will your DVD be heat-resistant? Or do you burn DVD's in heat-resistant glass? (asbestos?)

Re: EMP Attacks kill people too

Anonymous's picture

EMP attacks do not kill people. EMP stands for electromagnetic pulse. One does not need to deploy nukes to create an EMP either.

bonkers

Anonymous's picture

EMP weapons have been tested since the '50, and no unclassified weapon has come out of it. If you do not live in a political hot contry, your fear of EMP attack should be equal to fear of asteroid impact or other natural disasters; not very likely.

EMP weapons have been hyped by many on the net, but never has any failed or succesful attack been reported anywhere. Computers by design hare quite good at withstanding an EMP attack: the are enclosed in a faraday cage with all wires coming in fused at some point. Computers usually need to keep 1-3 GHz frequencies inside, so they are pretty resistant. An succesfull EMP attack might fry keyboards and other periferals, and it may even kill the mobo, it will not erase the hd, that is a little faradays cage (the hd casing) sitting in a bigger faraday cage.

You need a device using high explosives or massive capacitor banks to do any damage against computers; the much hyped magnetron dish on a truck will generate a narrow spectrum that will not penetrate computercases at all.

A simple question

Anonymous's picture

Hello,

I'm attempting to setup a small network for my research lab. I'm interested in distributed simulation. Currently I have three computers, including a laptop. The operating systems are Win2000 and WinXP (laptop). I want to add a linux server and want to run win2000 in at least one of the computer. Is this possible? How can I get a quick start?

Thank you very much,

SC

Re: EMP Attacks kill HDs/Tapes

Anonymous's picture

So that's why I'd buy DVDs for backing up data.

Must have failed 4th grade math, eh?

200GB / 4.7GB = 42.55 rounded up to 43. So, they'd need 43 disks!!!!!! I'm not even going to comment on how utterly stupid that idea is. Oh wait, I just did.Guess it's because that's such a stupid idea.

Re: EMP Attacks kill HDs/Tapes

Anonymous's picture

And is necessary to make such comments without any education? Only say that is not a good solution because the larger number of disk is sufficient.