Compression Tools Compared
Data compression also can speed up network transfers. How much depends on how fast your CPU and network are. Slow networks with fast CPUs can be sped up the most by thoroughly compressing the data. Alternatively, slow CPUs with fast connections do best with no compression.
Find the best compressor and compression level for your hardware in the graph shown in Figure 6. This graph's CPU and network speed axes are scaled logarithmically too. Look where your CPU and network speeds intersect in the graph, and try the data compression tool and compression level at that point. It also should give you a sense of how much your bandwidth may increase.
Network Transfer Estimates
To find the best compressors for various CPU and network speeds, I considered how long it takes to compress data, send it and decompress it. I projected how long compression and decompression should take on computers of various speeds by simply scaling actual test results from my 1.7GHz CPU. For example, a 3.4GHz CPU should compress data about twice as fast. Likewise, I estimated transfer times by dividing the size of the compressed data by the network's real speed.
The overall transfer time for non-filtering data compression tools, such as rzip, simply should be about the sum of the estimated times to compress, send and decompress the data.
However, compressors that can act as filters, such as gzip, have an advantage. They simultaneously can compress, transfer and decompress. I assumed their overall transfer times are dominated by the slowest of the three steps. I verified some estimates by timing real transfers.
For example, if you have a 56Kbps dial-up modem and a 3GHz CPU, their speeds intersect in the light-yellow region labeled lzma 26 at the top of the graph. This corresponds to using lzma with a 226 size dictionary. The graph predicts a 430% increase in effective bandwidth.
On the other hand, if you have a 1GHz network, but only a 100MHz CPU, it should be faster simply to send the raw uncompressed data. This is depicted in the flat black region at the bottom of the graph.
Don't assume that you always should increase performance the most by using lzma, however. The best compression tool for data transfers depends on the ratio of your particular CPU's speed to your particular network's speed.
If the sending and receiving computers have different CPU speeds, try looking up the sending computer's speed in the graph. Compression can be much more CPU-intensive. Check whether the data compression tool and scp are installed on both computers. Remember to replace email@example.com and file with the real names.
For the fastest CPUs and/or slowest network connections that fall in the graph's light-yellow region, speed up your network transfers like this:
$ cat file \ | lzma -x -s26 \ | ssh firstname.lastname@example.org "lzma -d > file"
ssh stands for secure shell. It's a safe way to execute commands on remote computers. This may speed up your network transfer by more than 400%.
For fast CPUs and/or slow networks that fall into the graph's dark-yellow zone, use rzip with a compression level of one. Because rzip doesn't work as a filter, you need temporary space for the compressed file on the originating box:
$ rzip -1 -k file $ scp file.rz email@example.com: $ ssh firstname.lastname@example.org "rzip -d file.rz"
The -1 tells rzip to use compression level 1, and the -k tells it to keep its input file. Remember to use a : at the end of the scp command.
rzipped network transfers can be 375% faster. That one-hour transfer might finish in only 16 minutes!
For slightly slower CPUs and/or faster networks that fall in the graph's orange region, try using gzip with compression level 1. Here's how:
$ gzip -1c file | ssh email@example.com "gzip -d > file"
It might double your effective bandwidth. -1c tells gzip to use compression level 1 and write to standard output, and -d tells it to decompress.
For fast network connections and slow CPUs falling in the graph's blue region, quickly compress a little with lzop at compression level 1:
$ lzop -1c file | ssh firstname.lastname@example.org "lzop -d > file"
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SuperTuxKart 0.9.2 Released
- Google's SwiftShader Released
- Doing for User Space What We Did for Kernel Space
- Parsing an RSS News Feed with a Bash Script
- SourceClear Open
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide