Compression Tools Compared
Data compression works so well that popular backup and networking tools have some built in. Linux offers more than a dozen compression tools to choose from, and most of them let you pick a compression level too. To find out which perform best, I benchmarked 87 combinations of tools and levels. Read this article to learn which compressor is a hundred times faster than the others and which ones compress the most.
The most popular data compression tool for Linux is gzip, which lets you choose a compression level from one to nine. One is fast, and nine compresses well. Choosing a good trade-off between speed and compression ratio becomes important when it takes hours to handle gigabytes of data. You can get a sense of what your choices are from the graph shown in Figure 1. The fastest choices are on the left, and the highest compressing ones are on the top. The best all-around performers are presented in the graph's upper left-hand corner.
But many other data compression tools are available to choose from in Linux. See the comprehensive compression and decompression benchmarks in Figures 2 and 3. As with gzip, the best performers are in the upper left-hand corner, but these charts' time axes are scaled logarithmically to accommodate huge differences in how fast they work.
How compactly data can be compressed depends on what type of data it is. Don't expect big performance increases from data that's already compressed, such as files in Ogg Vorbis, MP3 or JPEG format. On the other hand, I've seen data that allows performance increases of 1,000%!
All benchmarks in this article used the same 45MB of typical Linux data, containing:
24% ELF 32-bit LSB
15% ASCII C program
11% gzip compressed data
8% ASCII English text
7% binary package
2% current ar archive
2% Texinfo source text
2% PostScript document text
2% Bourne shell script
2% ASCII text
21% various other data types
This data set was chosen because it is more representative of the demands made on today's Linux systems than the data used in the traditional Canterbury and Calgary test data, because this data set is bigger and contains Linux binaries.
I used the same lightly loaded AMD Athlon XP 1700+ CPU with 1GB of RAM and version 2.4.27-1-k7 of the Linux kernel for all tests. Unpredictable disk drive delays were minimized by pre-loading data into RAM. Elapsed times were measured in thousandths of a second. I'm not affiliated with any of the tools, and I strove to be objective and accurate.
The tools that tend to compress more and faster are singled out in the graphs shown in Figures 4 and 5. Use these for backups to disk drives. Remember, their time axes are scaled logarithmically. The red lines show the top-performing ones, and the green lines show the top performers that also can act as filters.
Win an iPhone 6
Enter to Win
|Geek Hide-away in Guatemala - Stay for Free!||Nov 26, 2015|
|Microsoft and Linux: True Romance or Toxic Love?||Nov 25, 2015|
|Non-Linux FOSS: Install Windows? Yeah, Open Source Can Do That.||Nov 24, 2015|
|Cipher Security: How to harden TLS and SSH||Nov 23, 2015|
|Web Stores Held Hostage||Nov 19, 2015|
|diff -u: What's New in Kernel Development||Nov 17, 2015|
- Microsoft and Linux: True Romance or Toxic Love?
- Cipher Security: How to harden TLS and SSH
- Non-Linux FOSS: Install Windows? Yeah, Open Source Can Do That.
- Geek Hide-away in Guatemala - Stay for Free!
- Web Stores Held Hostage
- Firefox's New Feature for Tighter Security
- PuppetLabs Introduces Application Orchestration
- It's a Bird. It's Another Bird!
- diff -u: What's New in Kernel Development
- IBM LinuxONE Provides New Options for Linux Deployment