Tarsnap: On-line Backups for the Truly Paranoid

Now let's list the archives Tarsnap has stored:


# tarsnap --list-archives
docs.20120701
docs.20120702

To demonstrate Tarsnap's smart approach to storage further, I will delete the oldest backup:


# tarsnap df docs.20120701
                                 Total size  Compressed size
All archives                        2132325          1815898
  (unique data)                     2132325          1815898
This archive                        2132325          1815898
Deleted data                            445             1037

The "all archives" number has dropped because now I have only one archive, but the "unique data" has not changed much because it is still retaining all files necessary to satisfy my "docs.20120702" archive. If I list it, I can see my data is still there:


# tarsnap tvf docs.20120702
drwxrwxr-x  0 andrew 0 Jun 14 20:52 docs/
-rw-------  0 andrew 48568 Jun 14 16:41 docs/andrew_passwords.psafe3
-rw-rw-r--  0 andrew 46014 Jun 14 16:35 docs/vacation_notes.doc
-rw-rw-r--  0 andrew 134959 Jun 14 16:44 docs/vacation_reservation.pdf
-rw-rw-r--  0 andrew 48128 Jun 14 16:41 docs/vacation_hotels.doc
-rw-------  0 tina   14271 Jun 14 16:42 docs/tina_passwords.psafe3
-rw-rw----  0 andrew 1833222 Jun 14 16:38 docs/2011 Tax Return.pdf

I use a date string for convenient versioning, but I could just as easily use any naming convention for the archive, such as "docs.1", "docs.2" and so on. For my personal backups, I have a cron job that invokes Tarsnap nightly with a date-string-named archive:


tarsnap cf docs.`date '%+Y%m%d'` /docs

If I have a local calamity and want to restore that data, it is just another simple Tarsnap command to get my files back. Note that like traditional tar, Tarsnap removes the leading slash so all files are restored relative to the current working directory:


# cd /
# rm -rf docs
# tarsnap xvf docs.20120702
x docs/
x docs/andrew_passwords.psafe3
x docs/vacation_notes.doc
x docs/vacation_reservation.pdf
x docs/vacation_hotels.doc
x docs/tina_passwords.psafe3
x docs/2011 Tax Return.pdf

Tips

If you want to run Tarsnap as a nonroot user, create a .tarsnaprc file in your home directory. The syntax is identical to the tarsnap.conf discussed above. For example:


$ cat ~/tarsnap.conf
cachedir /home/andrew/tarsnap-cache
keyfile /home/andrew/tarsnap.key
print-stats

If you have other services or users contending for your Internet connection, use --maxbw-rate to specify a maximum bytes per second that Tarsnap will be allowed to use.

The print-stats command gives you account status information when used interactively, but for batch operations (such as running Tarsnap out of cron), you can suppress the output by removing that directive from your tarsnap.conf or by invoking Tarsnap with --no-print-stats.

Finally, you can play with the --dry-run and -v flags to simulate Tarsnap backup operations without actually burning network and disk. Once you've got your command constructed exactly as you want it, remove --dry-run.

License

Tarsnap is not distributed under an open-source license, although all client source is provided (and compiled by the user during install). However, the company regularly contributes back to projects whose code it utilizes, such as libarchive. Tarsnap also has open-sourced some of its own projects, including the scrypt package, the spiped secure pipe dæmon and the kivaloo NoSQL data store.

Further Information

The Tarsnap home page (http://tarsnap.com) has a wealth of documentation and information, as well as links to the Tarsnap IRC channel, mailing list and FAQ. The "Technical Details" section is absorbing reading for those interested in the deep details of Tarsnap's cryptographic approach and history.

Tarsnap also pays significant cash bounties for bugs found in the product, ranging from a few dollars for small cosmetic bugs up to a couple thousand dollars if someone finds a serious security flaw. This transparent approach is further comfort for the truly paranoid.

Tarsnap's current version is 1.0.32, released on February 22, 2012, for Linux, BSD, OS X, Solaris, Minix and Cygwin.

Q&A with Dr Colin Percival

Dr Colin Percival

A: I've heard "startup company" defined as "time to double revenue or the number of users is measured in months", and I've heard "highly successful startup company" defined as "time to double revenue or the number of users is measured in weeks". By those definitions, Tarsnap is a startup company, but not a highly successful one.

And yes, that is dodging the question, but Tarsnap is my primary source of income, and I come from a culture that considers someone's income to be a private matter, so I don't want to publish precise numbers.

Q. Looking at your Bug Bounties page, you've paid out more than $2,000 to users who've submitted bug reports. Why a "bug bounty" system as opposed to the traditional bug reporting in open-source projects?

A: I gave a talk about this at BSDCan'12 and AusCERT'12 (in consecutive weeks, no less—a word of advice, don't ever try to attend back-to-back conferences 10,000 miles apart). In short, bug bounties help get more people looking at code, and help encourage people to report anything they see that seems wrong.

Probably a better question is why I offered bounties for all bugs rather than just for security bugs—aside from Knuth's famous prizes, I don't know of any other case where bounties extend beyond security vulnerabilities. The answer here is roughly the same though—there's a lot of people who won't look for security bugs because they don't have a security background, but offering cash for all bugs gets them interested...and my experience with FreeBSD is that many security vulnerabilities aren't found by security people, but rather by other developers just saying "something here looks weird".

Q. What do you think are the weakest points of the Tarsnap design, from a security perspective? Is there anything that should keep the truly paranoid up at night?

The weakest link in Tarsnap's security, without question, is me. I wrote the Tarsnap client code to encrypt and sign everything on the client side, but how do you know that it does what I claim?

If you're paranoid, you should look at the code yourself and make sure it does what I claim it does. And when I release a new version, you should look at that too—compare against the previous version and make sure that the changes are sensible.

If the CIA kidnaps me and threatens to torture me until I decrypt someone's data, I won't be able to do anything to help them. But if the CIA kidnaps me and threatens to torture me unless I insert a Trojan horse into the next version of Tarsnap...well, I'm not optimistic about my ability to withstand torture.

Q. Tarsnap's client is very UNIX-nerd-friendly, with its familiar tar syntax. Do you have any interest or plans for a graphical interface for less-sophisticated users?

A: This is absolutely something I plan on doing...in the future. I have very little experience with anything GUI, so I'll probably end up paying someone to produce a GUI—ideally an open-source GUI for tar, since Tarsnap uses almost exactly the same command-line interface, and I think a good GUI front end to tar would be useful for its own sake too.

Q. Why attodollars and not femtodollars or zeptodollars?

A: With attodollars, I can express everything as a pair of 64-bit integers.

Q. Do you have any new features in the works for upcoming Tarsnap releases?

A: I'm mostly working on performance improvements these days. There's a few frequently requested features that I might add, but in general, there are good reasons when features don't exist—either they're impossible (for example, server-side expiration of old archives—the server has no way to know which blocks should be freed when an old archive is deleted) or would require a substantial redesign (for example, renaming archives—the client-server protocol has write transactions and delete transactions, but renaming would need to atomically write some files and delete others).

Backup image via Shutterstock.com.

______________________

Andrew Fabbro is a senior technologist living in the Portland, Oregon, area. He's used Linux since Slackware came on floppies and presently works for Con-way, a Fortune 500 transportation company.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

great article

RSA Course Online's picture

I use duplicity too.

Tarsnap

RonTrex's picture

I'm curious how Tarsnap measures up against encryption brute force on GPU clusters made to crack passwords. It wasn't that long ago a machine was used to crack any password within 5 hours through a large cluster of GPUs. I'm still a bit skeptical of cloud backups, but perhaps I should be more afraid of using Gmail for that matter. At least Tarsnap doesn't trap my info for eternity with no knowledge of who has access to it. - Ron @ bpl

http? It seems like they do

yang's picture

http? It seems like they do not care about security anyway.

duplicy offers same features -- on own system

volker's picture

I use duplicity (http://duplicity.nongnu.org/) for a year which offers the same features beside the commercial backup space.

Duplicity use OpenGPG as encryption (key or passphrase based).

From the page, duplicity supports "local file storage, scp/ssh, ftp, rsync, HSI, WebDAV, Tahoe-LAFS, and Amazon S3".

Yes, you need your own backup space. But there are reliable 100GB storage for $5 to $7/month out (and a lot other ones, some smaller but free...).

Thanks for the post. Nice

Anonymous's picture

Thanks for the post. Nice solution for an on-line backup, liked the price scheme too. And of source: that it's a tar based solution.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix