Compression Tools Compared
Filters are tools that can be chained together at the command line so that the output of one is piped elegantly into the input of the next. A common example is:
$ ls | more
Filtering is crucial for speeding up network transfers. Without it, you have to wait for all the data to be compressed before transferring any of it, and you need to wait for the whole transfer to complete before starting to decompress. Filters speed up network transfers by allowing data to be simultaneously compressed, transferred and decompressed. This happens with negligible latency if you're sending enough data. Filters also eliminate the need for an intermediate archive of your files.
Check whether the data compression tool that you want is installed on both computers. If it's not, you can see where to get it in the on-line Resources for this article. Remember to replace a/dir in the following examples with the real path of the data to back up.
Unless your data already is in one big file, be smart and consolidate it with a tool such as tar. Aggregated data has more redundancy to winnow out, so it's ultimately more compressible.
But be aware that the redundancy that saps your performance also may make it easier to recover from corruption. If you're worried about corruption, you might consider testing for it with the cksum command or adding a limited amount of redundancy back into your compressed data with a tool such as parchive or ras.
lzop often is the fastest tool. It finishes about three times faster than gzip but still compresses data almost as much. It finishes about a hundred times faster than lzma and 7za. Furthermore, lzop occasionally decompresses data even faster than simply copying it! Use lzop on the command line as a filter with the backup tool named tar:
$ tar c a/dir | lzop - > backup.tar.lzo
tar's c option tells it to create one big archive from the files in a/dir. The | is a shell command that automatically pipes tar's output into lzop's input. The - tells lzop to read from its standard input, and the > is a shell command that redirects lzop's output to a file named backup.tar.lzo.
You can restore with:
$ lzop -dc backup.tar.lzo | tar x
The d and c options tell lzop to decompress and write to standard output, respectively. tar's x option tells it to extract the original files from the archive.
Although lzop is impressive, you can get even higher compression ratios—much higher! Here's how. Combine a little-known data compression tool named lzma with tar to increase storage space effectively by 400%. Here's how you would use it to back up:
$ tar c a/dir | lzma -x -s26 > backup.tar.lzma
lzma's -x option tells it to compress more, and its -s option tells it how big of a dictionary to use.
You can restore with:
$ cat backup.tar.lzma | lzma -d | tar x
The -d option tells lzma to decompress. You need patience to increase storage by 400%; lzma takes about 40 times as long as gzip. In other words, that one-hour gzip backup might take all day with lzma.
This version of lzma is the hardest compressor to find. Make sure you get the one that acts as a filter. See Resources for its two locations.
The data compression tool with the best trade-off between speed and compression ratio is rzip. With compression level 0, rzip finishes about 400% faster than gzip and compacts data 70% more. rzip accomplishes this feat by using more working memory. Whereas gzip uses only 32 kilobytes of working memory during compression, rzip can use up to 900 megabytes, but that's okay because memory is getting cheaper and cheaper.
Here's the big but: rzip doesn't work as a filter—yet. Unless your data already is in one file, you temporarily need some extra disk space for a tar archive. If you want a good project to work on that would shake up the Linux world, enhance rzip to work as a filter. Until then, rzip is a particularly good option for squeezing a lot of data onto CDs or DVDs, because it performs well and you can use your hard drive for the temporary tar file.
Here's how to back up with rzip:
$ tar cf dir.tar a/dir $ rzip -0 dir.tar
The -0 option says to use compression level 0. Unless you use rzip's -k option, it automatically deletes the input file, which in this case is the tar archive. Make sure you use -k if you want to keep the original file.
rzipped tar archives can be restored with:
$ rzip -d dir.tar.rz $ tar xf dir.tar
rzip's default compression level is another top performer. It can increase your effective disk space by 375% but in only about a fifth of the time lzma can take. Using it is almost exactly the same as the example above; simply omit compression level -0.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- New Products
- Flexible Access Control with Squid Proxy
- Users, Permissions and Multitenant Sites
- Security in Three Ds: Detect, Decide and Deny
- High-Availability Storage with HA-LVM
- Tighten Up SSH
- DevOps: Everything You Need to Know
- Solving ODEs on Linux
- Non-Linux FOSS: MenuMeters
- diff -u: What's New in Kernel Development