Tarsnap: On-line Backups for the Truly Paranoid
Storing backups in the cloud requires a level of trust that not everyone is willing to give. While the convenience and low cost of automated, off-site backups is very compelling, the reality of putting personal data in the hands of complete strangers will never sit quite right with some people.
Enter Tarsnap—"on-line backups for the truly paranoid". Tarsnap is the brainchild of Dr Colin Percival, a former FreeBSD Security Officer. In 2006, he began research and development on a new solution for "encrypted, snapshotted remote backups", culminating in the release of Tarsnap in 2008.
Unlike other on-line backup solutions, Tarsnap uses an open, documented cryptographic design that securely encrypts your files. Rather than trusting a vendor's cryptographic claims, you have full access to the source code, which uses open-source libraries and industry-vetted protocols, such as RSA, AES and SHA.
Tarsnap provides a command-line client that operates very much like
the traditional UNIX tar command. Familiar syntax, such as tarsnap
cf and tarsnap xvf works as users would expect, except that instead
of manipulating local tarballs, the client is working with cloud-based
archives. These archives are stored on Amazon S3 with EC2 servers to
handle client connections. To minimize bandwidth costs, Tarsnap uses
smart rsync-like block-oriented snapshot operations that upload
only data set changes and minimize transmission costs.
Although you could roll your own backup system by encrypting your data, running an rsync server off-site and so on, Tarsnap's scriptability, professional cryptographic design and utility pricing make it an attractive service for clueful users.
Open-Source Strong Crypto
Tarsnap compresses, encrypts and cryptographically signs every byte you send to it. No knowledge of cryptographic protocols is required, but if you are an interested enthusiast or feel more comfortable if you can look under the hood, you'll find the source code and full design thoroughly documented on http://tarsnap.com.
During installation, the user creates a keyfile, which optionally can be passphrase-protected. The file contains two 2048-bit RSA keys for signing and encryption. AES-256 session keys are created at runtime and encrypted with the RSA keys. The HMAC-SHA-256 hash function is used to verify data blocks and prevent tampering, and the same hash function is used elsewhere to verify communication between the client and server.
If those cryptographic terms make your eyes glaze over, the simple description is that Tarsnap encrypts and verifies your data in motion and at rest using industry-standard protocols. From a user perspective, the only thing that needs to be safeguarded is the keyfile. As you might expect, if it's lost, the keys to decrypt data stored with Tarsnap are unavailable and your data is no longer usable. After generation, this keyfile ideally should be stored securely off-site.
Efficient Design
Tarsnap backs up variable-length, deduplicated blocks of data. This means that if you change one file in a directory, only that file is backed up. Indeed, only the changes to that file are backed up if the file previously existed. If you move files around, Tarsnap is smart enough to recognize and skip previously backed-up blocks. Anytime you wish, you can take a backup of a directory, which creates a new archive. Behind the scenes, Tarsnap will retain whatever files are necessary to make that archive whole.
Suppose you have a directory with 100MB of files and a 10% daily change rate. If you create a Tarsnap archive every day of that directory, you will see the following consumption pattern:
-
Monday (initial): 100MB archive created (100MB uploaded, 100MB stored).
-
Tuesday: 10MB archive created (10MB new data uploaded, 110MB total data stored).
-
Wednesday: 10MB archive created (10MB new data uploaded, 120MB total data stored).
If on Thursday you delete the Monday archive, the Tuesday archive will be "made whole" by retaining whatever files from Monday are needed. This gives you tremendous flexibility to pick restore points. Of course, you can keep multiple archives with however many restore points you wish.
Andrew Fabbro is a senior technologist living in the Portland, Oregon, area. He's used Linux since Slackware came on floppies and presently works for Con-way, a Fortune 500 transportation company.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- I once had a better way I
3 hours 43 min ago - Not only you I too assumed
4 hours 54 sec ago - another very interesting
5 hours 53 min ago - Reply to comment | Linux Journal
7 hours 47 min ago - Reply to comment | Linux Journal
14 hours 41 min ago - Reply to comment | Linux Journal
14 hours 57 min ago - Favorite (and easily brute-forced) pw's
16 hours 48 min ago - Have you tried Boxen? It's a
22 hours 40 min ago - seo services in india
1 day 3 hours ago - For KDE install kio-mtp
1 day 3 hours ago



Comments
great article
I use duplicity too.
Tarsnap
I'm curious how Tarsnap measures up against encryption brute force on GPU clusters made to crack passwords. It wasn't that long ago a machine was used to crack any password within 5 hours through a large cluster of GPUs. I'm still a bit skeptical of cloud backups, but perhaps I should be more afraid of using Gmail for that matter. At least Tarsnap doesn't trap my info for eternity with no knowledge of who has access to it. - Ron @ bpl
http? It seems like they do
http? It seems like they do not care about security anyway.
duplicy offers same features -- on own system
I use duplicity (http://duplicity.nongnu.org/) for a year which offers the same features beside the commercial backup space.
Duplicity use OpenGPG as encryption (key or passphrase based).
From the page, duplicity supports "local file storage, scp/ssh, ftp, rsync, HSI, WebDAV, Tahoe-LAFS, and Amazon S3".
Yes, you need your own backup space. But there are reliable 100GB storage for $5 to $7/month out (and a lot other ones, some smaller but free...).
Thanks for the post. Nice
Thanks for the post. Nice solution for an on-line backup, liked the price scheme too. And of source: that it's a tar based solution.