Open-Source Backups Using Amanda

by Phil Moses

Those of us who have received the call can feel the tension and nervous tone in the caller's voice when he or she asks, “How good are the backups?” A failed disk, files deleted by mistake, a disgruntled coworker or, worse, a security breach all can be times when you need to depend on backups.

Data probably is the most important element in computing, but in too many cases I see data backups overlooked or approached in such a carefree manner that I shiver. To this end, this article discusses the University of Maryland's Amanda (advanced Maryland automatic disk archiver) backup software, a relatively easy-to-use disk archiver built upon native dump and/or GNU tar tools. I often feel Amanda does not get the respect it deserves in a Linux/UNIX cross-platform environment. I confidently can say, however, that Amanda is a reliable platform for many Linux and UNIX users who are comfortable with a command-line interface.

I began using Amanda approximately three years ago as a backup platform in an educational environment. Working in an educational and research environment provides many challenges when it comes to backups. Many technical professionals and hobbyists face these same challenges related to their own backups. The challenges include the lack of a budget large enough to include backup software, a wide variety of OSes and distributions to back up and limited human resources to accomplish all the backups.

For myself and others, the ideal solution would be to have a central backup server that can accommodate multiple configurations as well as multiple backup tape devices, while requiring a minimum amount of time and resources. Amanda meets these requirements and and many more.

Getting the Tape Rolling

Installing Amanda is a straightforward process. Whether you install from source or a binary package, you need to have a backup user and a backup group. You also need to enable disk access permission to the Amanda user and backup group. The default install from RPM takes care of this step for you, adding Amanda as a user and including Amanda as a member of the disk group.

With a default installation of Amanda, the majority of your server configuration files are placed in subdirectories of /etc/amanda/, for example /etc/amanda/DailySet1. The minimal files to place in /etc/amanda/<amanda_config> are the amanda.conf file and the disklist file. The amanda.conf file holds the configuration settings for that backup run and the disklist file holds a list of disks to be backed up with that configuration. Additional files, which are placed in /var/lib/amanda by default, include index files, log file, current info files as well as additional configuration files. If the administrator prefers, the executable files are normally placed in /usr/sbin and include amdump, the actual Amanda dump program; amcheck, the Amanda pre-dump check utility; amverify, the Amanda utility to verify the integrity of the dumps; amrecover and amrestore, the recovery programs; and various other Amanda utilities.

GUI? We Don't Need No Stinkin' GUI!

Once Amanda is installed, amcheck and amdump search the default config directory of /etc/amanda/<configuration> for the configuration files unless it is told to do otherwise. Within /etc/amanda, you need a directory for each configuration to be run. The amanda.conf file may appear somewhat complex, but it is, in fact, rather self explanatory. It contains plenty of documentation, and once read it is easy to interpret and customize for your environment. On both the client and server end, inetd.conf or xinetd.conf must contain entries for Amanda. On the server, you also need to have entries for the index server, amandaidx, and the tape server, amidxtaped (Listing 1). In addition to the xinetd settings, for each host within the Amanda user's home directory, an .amandahosts file is required. This file defines each host and the user from that host who is allowed to access the machine (Listing 2). Particular attention should be paid to the permissions on the Amanda user's home directory and the accompanying files, for obvious security reasons.

Listing 1. All three xinetd entries for Amanda. Only the service Amanda is needed for clients.

service amanda
{
        socket_type = dgram
        protocol    = udp
        wait        = yes
        user        = amanda
        group       = disk
        server      = /usr/lib/amanda/amandad
        disable     = no
}

service amidxtape
{
        socket_type = stream
        protocol    = tcp
        wait        = no
        user        = amanda
        group       = disk
        server      = /usr/lib/amanda/amidxtaped
        disable     = yes
}

service amandaidx
{
        socket_type = stream
        protocol    = tcp
        wait        = no
        user        = amanda
        group       = disk
        server      = /usr/lib/amanda/amindexd
        disable     = no
}

Listing 2. A Brief Example of the .amandahosts File

localhost amanda
localhost.localdomain amanda
foo           amanda
foo.localdomain           amanda

If you run a firewall, initially it can be somewhat of a challenge to work out the rules that allow Amanda access. The Amanda server contacts the clients on port 10080. At this point, the Amanda client forks the amandad process and seek random UDP ports on the Amanda server. In turn, the server opens a couple of ports to the client for the data and messages. If indexes are enabled, they also need an additional port open on the server, TCP 10082.

The random ports can be addressed in multiple ways. Beginning with Amanda version 2.4.2, a compile option, --with-portrange=xxx,yyy, directs Amanda to use the given port ranges to connect clients and servers. The selected port range needs to be opened on both the client and server ends of the firewalls. If you plan on running amrecover on the client end, you should plan on opening TCP ports 10083 and 10082 on the server, in addition to UDP port 10080 on both the server and client machines. All to these are in addition to the ports defined with --with-portrange.

Amanda has the ability to work with most tape devices and libraries. The tricks are defining the proper tape type in the amanda.conf file and selecting and defining the proper changer scripts (Listing 3). A single tape drive is by far the least complicated method and simply by defining the type of drive and tape type in the amanda.conf file you are on your way. If a tape library or stacker is available, the configuration is a multiple-step process consisting of defining the tape changer program in amanda.conf (cgh-multi, chng-scsi and chg-zd-mtx are three examples of configurations) and providing a changer file directly within amanda.conf. With the Amanda server running on a Linux platform, you need the sg (scsi -- generic) module compiled into the kernel. You also need the mtx program if you are running any of the mtx-based changer scripts. Amanda seeks a valid header on tapes, and without it amcheck and/or amdump fails. Use the amlabel tool to label the tapes, along with the regular expression defined in the amanda.config file, for example:

/usr/sbin/amlabel DailySet DailySet1

Listing 3. The Defined Tape Changer in amanda.conf and an Example CHANGER.conf File

# At most one changerfile entry must be defined;
#select the most
# appropriate one for your configuration.
#If you select man-changer,
# keep the first one; if you decide not to use a tape
# changer, you may
# comment them all out.

changerdev "/dev/sg1"
runtapes 1   # number of tapes to be used in a /
#single run of amdump
tpchanger "chg-zd-mtx"  # the tape-changer /
#glue script
tapedev "/dev/nst1"     # the no-rewind tape /
# device to be used
changerfile "/var/lib/amanda/CHANGER"

The CHANGER.conf file
changerdev=/dev/nst1
havereader=1
offline_before_unload=1
OFFLINE_BEFORE_UNLOAD=1
poll_drive_ready=10
Max_drive_wait=99
unloadpause=20
driveslot=0

When dealing with multiple computing groups or business departments, you may determine that a separate backup scheme for each group is required. This is accomplished easily using Amanda with separate backup configurations. An example of this would be two backup environments, one for each of two projects being funded independently and therefore requiring that you account for the time and resources spent on each. Environment one is labeled DailySet-BigFunds and environment two is labeled DailySet-LilFunds. For the DailySet-BigFunds project, you have a large tape library that houses ten tapes. DailySet-LilFunds provides you with a single DLT IV drive whose tapes must be changed daily. To keep these two projects separate but housed on the same backup server, you would set up the separate directories in /etc/amanda as DailySet-BigFunds[number of runs] and DailySet-LilFunds[number of runs]. Each of the mentioned directories would contain an amanda.conf file and a disklist of the included filesystems to be backed up. This pattern allows mutually exclusive backups to be performed on separate tape devices. If and when configuration changes must be made, you need to make them only to the relevant files. This is handy in an educational environment where you often have multiple groups with varied hardware and backup media.

Game On!

Here we are with the software installed, the tape and library devices working and the tapes all labeled, loaded and ready to go. So, who do we call to begin our virtually hands-off backups? None other than crontab, which helps make our backups as hands-off as possible (Listing 4). It is recommended that your crontab contain two entries for each Amanda run that will take place. The first entry should be an amcheck to check the tape drive, tape header and each Amanda client to ensure that the backup operation is carried out as planned. A good idea is to have this mailed back to a system account that is checked often enough so that any problems can be corrected prior to run time. The second entry per job in the crontab should be the amdump command itself, which allows Amanda to begin the backups without user intervention.

Listing 4. Using amrecover

[root]# amrecover -C DailySet1
AMRECOVER Contacting server on localhost
220 jule AMANDA index server (2.4.2p2) ready.
200 Access OK
Setting restore date to today (2004-02-13)
200 Working date set to 2004-02-13.
200 Config set to DailySet1.
200 Dump host set to localhost.
Can't determine disk and mount point from $CWD
amrecover> sethost localhost
200 Dump host set to localhost.
amrecover> setdisk /home
200 Disk set to /home.
amrecover> ls
2004-01-29 condor/
amrecover> add condor
Added dir /condor at date 2004-01-29
amrecover> extract
Extracting files using drive /dev/nst0 on localhost.
The following tapes are needed: DailySet3
Restoring files into directory /restored
Continue? [Y/n]:

Each Amanda client stores helpful information from the crontab runs in /tmp/amanda. Here you can find logs from amandad, selfcheck (the check that runs prior to backup) and sendbackup, all with an accompanying date. Once the backups are completed, the tapes stored and changed and all backup procedures completed, it is not yet time to call it a wrap on the data. Amanda includes a couple of handy programs, amverify and amverifyrun, that verify the media as well as the data from any Amanda run that already has taken place. It is a good idea to run through amverify from time to time (as time allows, if not every time) to verify that your backups actually are going to be recoverable. Because you have dedicated the time and effort to configure and carry out a mostly automated and consistent backup scheme, it only makes sense to verify that all is well. And, you are keeping some of your sets of tapes off site, right?

I must quickly mention the sole visual utility available with Amanda, amplot. amplot has the ability to read an Amanda output file and create a graphical interpretation of your backups, allowing the administrator to determine the efficiency of the Amanda install and configuration and whether any changes need to be made.

Walking Barefoot through Broken Glass?

Walking barefoot through broken glass is no fun. That being said, I have been in situations where I would rather walk through that broken glass barefoot than have to deal with miserable data recovery. Data recovery with Amanda, however, is both simple and fun. Well, maybe it isn't exactly fun, but a certain amount of satisfaction comes from restoring data. Running amrecover provides you with an easy-to-use command-line interface for browsing files and directories. The interface allows you to browse, add and extract single files or directories. This entails multiple steps, setting the date, setting the disk and setting the host. For example,

/usr/sbin/amrecover -C myconfig1 -t localhost -s localhost

instructs amrecover to use the DailySet1 configuration (-C), the index host of localhost (-t) and the Amanda server of localhost (-s); see Listing 4.

One fear when dealing with backups, especially indexed backups, is what happens when your master index drive fails. If you're recovering from indexes and your master Amanda drive fails, are you out of luck? Not at all. amrestore allows you to restore a full Amanda image without using indexes and without the need for any configurations. amrestore also can be used to do complete filesystem restores. In my experience, amrestore is most useful if you lose your master Amanda drive. You can start with a fresh install and run amrestore, recover the complete image and begin where you left off with all previous indexes and configurations. As a side note, I make a monthly dump of both the Amanda home directory and the Amanda configuration directories to an alternate machine. This allows me to reference the config files should a master drive die and unexpected problems arise.

Conclusions, Pros and Cons

Amanda is an excellent open-source backup application that is highly configurable to accommodate cross-platform backups. As with all software, it may not be the right application for every environment, but it does fit many backup needs. The following are what I see to be the pros and cons of the Amanda backup software.

Once Amanda is configured and the crontab is set up accordingly, it is possible to have hands-off consistent backups. Amanda can run on almost every UNIX and Linux platform available. It also is able to back up Windows data through Samba shares. In addition, Amanda is highly compatible with most backup media and hardware.

As for the cons, Amanda does not have the ability to span tapes at this time. This means that once each Amanda run is completed, the tape is moved to the bottom of the heap and a new tape is requested. Some users might consider the command-line interface to be a disadvantage. Finally, although it currently can be done, backing up to media other than tape, such as disk arrays, is somewhat difficult with Amanda.

Phil Moses spends his days working as a Systems Manager for the Physical Oceanography Group at Scripps Institution of Oceanography in San Diego, California. He spends his nights and time off dreaming of tropical environments with clean warm water, uncrowded waves, boats and plenty of tuna. Comments are invited at [email protected].

Load Disqus comments