Compression Tools Compared
Filters are tools that can be chained together at the command line so that the output of one is piped elegantly into the input of the next. A common example is:
$ ls | more
Filtering is crucial for speeding up network transfers. Without it, you have to wait for all the data to be compressed before transferring any of it, and you need to wait for the whole transfer to complete before starting to decompress. Filters speed up network transfers by allowing data to be simultaneously compressed, transferred and decompressed. This happens with negligible latency if you're sending enough data. Filters also eliminate the need for an intermediate archive of your files.
Check whether the data compression tool that you want is installed on both computers. If it's not, you can see where to get it in the on-line Resources for this article. Remember to replace a/dir in the following examples with the real path of the data to back up.
Unless your data already is in one big file, be smart and consolidate it with a tool such as tar. Aggregated data has more redundancy to winnow out, so it's ultimately more compressible.
But be aware that the redundancy that saps your performance also may make it easier to recover from corruption. If you're worried about corruption, you might consider testing for it with the cksum command or adding a limited amount of redundancy back into your compressed data with a tool such as parchive or ras.
lzop often is the fastest tool. It finishes about three times faster than gzip but still compresses data almost as much. It finishes about a hundred times faster than lzma and 7za. Furthermore, lzop occasionally decompresses data even faster than simply copying it! Use lzop on the command line as a filter with the backup tool named tar:
$ tar c a/dir | lzop - > backup.tar.lzo
tar's c option tells it to create one big archive from the files in a/dir. The | is a shell command that automatically pipes tar's output into lzop's input. The - tells lzop to read from its standard input, and the > is a shell command that redirects lzop's output to a file named backup.tar.lzo.
You can restore with:
$ lzop -dc backup.tar.lzo | tar x
The d and c options tell lzop to decompress and write to standard output, respectively. tar's x option tells it to extract the original files from the archive.
Although lzop is impressive, you can get even higher compression ratios—much higher! Here's how. Combine a little-known data compression tool named lzma with tar to increase storage space effectively by 400%. Here's how you would use it to back up:
$ tar c a/dir | lzma -x -s26 > backup.tar.lzma
lzma's -x option tells it to compress more, and its -s option tells it how big of a dictionary to use.
You can restore with:
$ cat backup.tar.lzma | lzma -d | tar x
The -d option tells lzma to decompress. You need patience to increase storage by 400%; lzma takes about 40 times as long as gzip. In other words, that one-hour gzip backup might take all day with lzma.
This version of lzma is the hardest compressor to find. Make sure you get the one that acts as a filter. See Resources for its two locations.
The data compression tool with the best trade-off between speed and compression ratio is rzip. With compression level 0, rzip finishes about 400% faster than gzip and compacts data 70% more. rzip accomplishes this feat by using more working memory. Whereas gzip uses only 32 kilobytes of working memory during compression, rzip can use up to 900 megabytes, but that's okay because memory is getting cheaper and cheaper.
Here's the big but: rzip doesn't work as a filter—yet. Unless your data already is in one file, you temporarily need some extra disk space for a tar archive. If you want a good project to work on that would shake up the Linux world, enhance rzip to work as a filter. Until then, rzip is a particularly good option for squeezing a lot of data onto CDs or DVDs, because it performs well and you can use your hard drive for the temporary tar file.
Here's how to back up with rzip:
$ tar cf dir.tar a/dir $ rzip -0 dir.tar
The -0 option says to use compression level 0. Unless you use rzip's -k option, it automatically deletes the input file, which in this case is the tar archive. Make sure you use -k if you want to keep the original file.
rzipped tar archives can be restored with:
$ rzip -d dir.tar.rz $ tar xf dir.tar
rzip's default compression level is another top performer. It can increase your effective disk space by 375% but in only about a fifth of the time lzma can take. Using it is almost exactly the same as the example above; simply omit compression level -0.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
|Fancy Tricks for Changing Numeric Base||May 29, 2016|
|Working with Command Arguments||May 28, 2016|
|Secure Desktops with Qubes: Installation||May 28, 2016|
|CentOS 6.8 Released||May 27, 2016|
|Secure Desktops with Qubes: Introduction||May 27, 2016|
|Chris Birchall's Re-Engineering Legacy Software (Manning Publications)||May 26, 2016|
- Tips for Optimizing Linux Memory Usage
- Secure Desktops with Qubes: Introduction
- Working with Command Arguments
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Secure Desktops with Qubes: Installation
- Fancy Tricks for Changing Numeric Base
- CentOS 6.8 Released
- Linux Mint 18
- The Italian Army Switches to LibreOffice
- Petros Koutoupis' RapidDisk
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide