Tarsnap: On-line Backups for the Truly Paranoid
Your Current Account Balance Is $4.992238237884881224
Tarsnap works on a prepaid utility-metered model. Subscribers deposit a minimum of $5.00 and are charged only for the storage and bandwidth they consume. Although the cost is higher than plain Amazon S3 service, it reflects both the cryptographic, compression and deduplication value-add of Tarsnap. At the time of this writing, Tarsnap costs 30 cents per gigabyte-month for storage and 30 cents per gigabyte transmitted.
This cost may make Tarsnap infeasible for large, whole-server terabyte-size backups. However, it is ideal for critical, sensitive files that must be durable, available and safe in the event an attacker succeeded in compromising them. With no minimum charge or monthly fees, Tarsnap is very economical for small data sets or for data that compresses well. Some examples:
-
Backing up 100MB of files with 10% daily change rate for a month would cost only 30 cents.
-
A gigabyte that is backed up weekly with a 20% change rate would cost $1.40 a month.
Tarsnap bills based on attodollars (quintillionths of a dollar) to avoid profiting through rounding. This means your account balance is tracked to 18 decimal places. This is not just "pay by the drink" cloud pricing—it's practically "pay by the atom". Some users find that a small deposit lasts them months or years.
Important Flexibility
One of Tarsnap's best features is how easy it is to script.
The ability to put a tarsnap cf command into a shell script makes
use in cron jobs very straightforward, which encourages unattended,
automated backups—the best kind.
Crucially, Tarsnap also supports a division of responsibilities. You can use the tarsnap-keymgmt tool to create keyfiles with limited authority. You may have one keyfile that lives on your server with permission to create archives, but not the authority to delete them. A master key with full privileges could be kept off-site, so that if attackers were to compromise your server, they would be unable to destroy your backups.
Using Tarsnap
To get started with Tarsnap, register at tarsnap.com, deposit some funds into your account, and download the client.
The client is available only as source, but the straightforward
./configure ; make install process is very easy. The client is
supported on all major Linux distributions (as well as BSD-based systems).
Take a quick peek at the download page to make sure you have the required
operating system packages, as some of the development packages are not
installed in typical Linux configurations.
If you are using a firewall, be aware that Tarsnap communicates via TCP on port 9279.
There are only two critical configuration items: the location of your keyfile and the location of your Tarsnap cache. Both are set in /usr/local/etc/tarsnap.conf. A tarsnap.conf.example is provided, and you probably can just copy the example as is. It defines your Tarsnap key as /root/tarsnap.key and your cache directory as /usr/local/tarsnap-cache, which will be created if it doesn't exist. The cachedir is a small state-tracking directory that lets Tarsnap keep track of backups.
Next, register your machine as follows. In this case, I'm setting up Tarsnap service for a machine called helicarrier. The e-mail address and password are the ones I used when I signed up for service with Tarsnap:
# tarsnap-keygen --keyfile /root/tarsnap.key
↪--user andrew@fabbro.org --machine helicarrier
Enter tarsnap account password:
#
I have a directory I'd like to back up with Tarsnap:
# ls -l /docs
total 2092
-rw-rw---- 1 andrew 1833222 Jun 14 16:38 2011 Tax Return.pdf
-rw------- 1 andrew 48568 Jun 14 16:41 andrew_passwords.psafe3
-rw------- 1 tina 14271 Jun 14 16:42 tina_passwords.psafe3
-rw-rw-r-- 1 andrew 48128 Jun 14 16:41 vacation_hotels.doc
-rw-rw-r-- 1 andrew 46014 Jun 14 16:35 vacation_notes.doc
-rw-rw-r-- 1 andrew 134959 Jun 14 16:44 vacation_reservation.pdf
To back up, I just tell Tarsnap what name I want to call my archive ("docs.20120701" in this case) and which directory to back up. There's no requirement to use a date string in the archive name, but it makes versioning straightforward, as you'll see:
# tarsnap cf docs.20120701 /docs
tarsnap: Removing leading '/' from member names
Total size Compressed size
All archives 2132325 1815898
(unique data) 2132325 1815898
This archive 2132325 1815898
New data 2132325 1815898
In my tarsnap.conf, I enabled the print-stats directive, which gives
the account report shown. Note the compression, which reduces storage
costs and improves cryptographic security. The "compressed
size" of the
"unique data" shows how much data is actually stored at
Tarsnap, and you
pay only for the compressed size.
The next day, I back up docs again to "docs.20120702". If I haven't made many changes, the backup will proceed very quickly and use little additional space:
# tarsnap cf docs.20120702 /docs
tarsnap: Removing leading '/' from member names
Total size Compressed size
All archives 4264650 3631796
(unique data) 2132770 1816935
This archive 2132325 1815898
New data 445 1037
As you can see, although the amount of data for "all archives" has grown, the actual amount of "unique data" has barely increased. Tarsnap is smart enough to avoid backing up data that has not changed.
Andrew Fabbro is a senior technologist living in the Portland, Oregon, area. He's used Linux since Slackware came on floppies and presently works for Con-way, a Fortune 500 transportation company.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Dynamic DNS—an Object Lesson in Problem Solving
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- Tech Tip: Really Simple HTTP Server with Python
- Please correct the URL for Salt Stack's web site
2 hours 40 min ago - Android is Linux -- why no better inter-operation
4 hours 56 min ago - Connecting Android device to desktop Linux via USB
5 hours 24 min ago - Find new cell phone and tablet pc
6 hours 22 min ago - Epistle
7 hours 51 min ago - Automatically updating Guest Additions
9 hours 10 sec ago - I like your topic on android
9 hours 46 min ago - This is the easiest tutorial
16 hours 22 min ago - Ahh, the Koolaid.
22 hours 49 sec ago - git-annex assistant
1 day 4 hours ago



Comments
great article
I use duplicity too.
Tarsnap
I'm curious how Tarsnap measures up against encryption brute force on GPU clusters made to crack passwords. It wasn't that long ago a machine was used to crack any password within 5 hours through a large cluster of GPUs. I'm still a bit skeptical of cloud backups, but perhaps I should be more afraid of using Gmail for that matter. At least Tarsnap doesn't trap my info for eternity with no knowledge of who has access to it. - Ron @ bpl
http? It seems like they do
http? It seems like they do not care about security anyway.
duplicy offers same features -- on own system
I use duplicity (http://duplicity.nongnu.org/) for a year which offers the same features beside the commercial backup space.
Duplicity use OpenGPG as encryption (key or passphrase based).
From the page, duplicity supports "local file storage, scp/ssh, ftp, rsync, HSI, WebDAV, Tahoe-LAFS, and Amazon S3".
Yes, you need your own backup space. But there are reliable 100GB storage for $5 to $7/month out (and a lot other ones, some smaller but free...).
Thanks for the post. Nice
Thanks for the post. Nice solution for an on-line backup, liked the price scheme too. And of source: that it's a tar based solution.