Archiving and Compression
Before bunzipping a file (or files) with bunzip, you might want to verify that they're going to bunzip correctly without any file corruption. To do this, use the -t (or --test) option.
$ bunzip2 -t paradise_lost.txt.gz $
Just as with gunzip, if there's nothing wrong with the archive, bunzip2 doesn't report anything back to you. If there's a problem, you'll know, but if there's not a problem, bunzip2 is silent.
Remember, tar doesn't compress; it merely archives (the resulting archives are known as tarballs, by the way). Instead, tar uses other programs, such as gzip or bzip2, to compress the archives that tar creates. Even if you're not going to compress the tarball, you still create it the same way with the same basic options: -c (or --create), which tells tar that you're making a tarball, and -f (or --file), which is the specified filename for the tarball.
$ ls -l scott scott 102519 job.txt scott scott 1236574 moby-dick.txt scott scott 508925 paradise_lost.txt $ tar -cf moby.tar *.txt $ ls -l scott scott 102519 job.txt scott scott 1236574 moby-dick.txt scott scott 1853440 moby.tar scott scott 508925 paradise_lost.txt
Pay attention to two things here. First, add up the file sizes of job.txt, moby-dick.txt, and paradise_lost.txt, and you get 1848018 bytes. Compare that to the size of moby.tar, and you see that the tarball is only 5422 bytes bigger. Remember that tar is an archive tool, not a compression tool, so the result is at least the same size as the individual files put together, plus a little bit for overhead to keep track of what's in the tarball. Second, notice that tar, unlike gzip and bzip2, leaves the original files behind. This isn't a surprise, considering the tar command's background as a backup tool.
What's really cool about tar is that it's designed to compress entire directory structures, so you can archive a large number of files and subdirectories in one fell swoop.
$ ls -lF drwxr-xr-x scott scott 168 moby-dick/ $ ls -l moby-dick/* scott scott 102519 moby-dick/job.txt scott scott 1236574 moby-dick/moby-dick.txt scott scott 508925 moby-dick/paradise_lost.txt moby-dick/bible: scott scott 207254 genesis.txt scott scott 102519 job.txt $ tar -cf moby.tar moby-dick/ $ ls -lF scott scott 168 moby-dick/ scott scott 2170880 moby.tar
The tar command has been around forever, and it's obvious why: It's so darn useful! But it gets even more useful when you start factoring in compression tools, as you'll see in the next section.
If you look back at "Archive and Compress Files Using gzip" and "Archive and Compress Files Using bzip2" and think about what was discussed there, you'll probably start to figure out a problem. What if you want to compress a directory that contains 100 files, contained in various subdirectories? If you use gzip or bzip2 with the -r (for recursive) option, you'll end up with 100 individually compressed files, each stored neatly in its original subdirectory. This is undoubtedly not what you want. How would you like to attach 100 .gz or .bz2 files to an email? Yikes!
That's where tar comes in. First you'd use tar to archive the directory and its contents (those 100 files inside various subdirectories) and then you'd use gzip or bzip2 to compress the resulting tarball. Because gzip is the most common compression program used in concert with tar, we'll focus on that.
You could do it this way:
$ ls -l moby-dick/* scott scott 102519 moby-dick/job.txt scott scott 1236574 moby-dick/moby-dick.txt scott scott 508925 moby-dick/paradise_lost.txt moby-dick/bible: scott scott 207254 genesis.txt scott scott 102519 job.txt $ tar -cf moby.tar moby-dick/ | gzip -c > moby.tar.gz $ ls -l scott scott 168 moby-dick/ scott scott 20 moby.tar.gz
That method works, but it's just too much typing! There's a much easier way that should be your default. It involves two new options for tar: -z (or --gzip), which invokes gzip from within tar so you don't have to do so manually, and -v (or --verbose), which isn't required here but is always useful, as it keeps you notified as to what tar is doing as it runs.
$ ls -l moby-dick/* scott scott 102519 moby-dick/job.txt scott scott 1236574 moby-dick/moby-dick.txt scott scott 508925 moby-dick/paradise_lost.txt moby-dick/bible: scott scott 207254 genesis.txt scott scott 102519 job.txt $ tar -zcvf moby.tar.gz moby-dick/ moby-dick/ moby-dick/job.txt moby-dick/bible/ moby-dick/bible/genesis.txt moby-dick/bible/job.txt moby-dick/moby-dick.txt moby-dick/paradise_lost.txt $ ls -l scott scott 168 moby-dick scott scott 846049 moby.tar.gz
The usual extension for a file that has had the tar and then the gzip commands used on it is .tar.gz; however, you could use .tgz and .tar.gzip if you like.
Note - It's entirely possible to use bzip2 with tar instead of gzip. Your command would look like this (note the -j option, which is where bzip2 comes in):
$ tar -jcvf moby.tar.bz2 moby-dick/
In that case, the extension should be .tar.bz2, although you may also use .tar.bzip2, .tbz2, or .tbz. Yes, it's very confusing that using gzip or bzip2 might both result in a file ending with .tbz. This is a strong argument for using anything but that particular extension to keep confusion to a minimum.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Tech Tip: Really Simple HTTP Server with Python
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Rogue Wave Software's Zend Server
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
- Parsing an RSS News Feed with a Bash Script
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide