bzip2 - ratfactor

Yay! A "simple" utility! This is more what I had in mind when I started this PkgBlog. :-)

Wikipedia’s Bzip2 page has some good general information about this utility. I also found some interesting discussion on lwn.net: bzip.org changes hands.

Let’s see what we have in the package: 5 executables, three of which are shell scripts, and all of which have man pages.

And - holy cow! - a comprehensive HTML manual (47 pages on my terminal in lynx) installed at /usr/doc/bzip2-1.0.6/manual.html. Nice!

Using it

After a quick glance at the man page, I determined that compressing, examining, and decompressing a text file would work like this:

$ du -b foo.txt
219106	foo.txt

$ bzip2 foo.txt
$ du -b foo.txt.bz2
60958	foo.txt.bz2

$ bzcat foo.txt.bz2 | grep cow
  was a large statue of something that looked like a cow under a wooden bridge
  it looked like the cow was frozen in place

$ bunzip2 foo.txt.bz2
$ head foo.txt
One morning, as Lucy was playing with a mousetrap and a pair of tin soldiers...

Here we can see that the original text file was 219Kb and bzip2 compressed it down to 60Kb. I was able to decompress the file to STDOUT in order to search it with grep and then decompressed the file back to its original name.

Note that both bunzip2 and bzcat are symlinks to bzip2 and when called with these names, the executable will perform these specific behaviors. (You can also just call bzip2 with the appropriate options.)

I see also that I could have used the script bzgrep to search in the compressed file (it also has the intelligence to detect if the file is actually compressed, first).

The bzmore and bzdiff scripts are the exact same concept - and I imagine they’d be handy to have if you found yourself dealing with bzip2-compressed files on a daily basis.

The last executable is interesting: bzip2recover.

bzip2recover

The man page for bzip2 explains that the compression is done in blocks and that file integrety is checked with CRCs.

If you have a file that is big enough, you can recover good blocks from a partially-corrupted bzip2 compressed file.

Out of curiosity, I made my text file larger by concatenating several copies of it with cat. Then I zipped the result and tried the bzip2recover utility on it. (I didn’t bother trying to corrupt part of it first.)

$ cat foo.txt foo.txt foo.txt foo.txt foo.txt > foobig.txt
$ bzip2 foobig.txt

$ bzip2recover foobig.txt.bz2
bzip2recover 1.0.6: extracts blocks from damaged .bz2 files.
bzip2recover: searching for block boundaries ...
   block 1 runs from 80 to 790000
   block 2 runs from 790049 to 1228560
   block 3 runs from 1228609 to 1228648 (incomplete)
bzip2recover: splitting into blocks
   writing block 1 to `rec00001foobig.txt.bz2' ...
   writing block 2 to `rec00002foobig.txt.bz2' ...
bzip2recover: finished

$ ls -l
total 524
-rw-r--r-- 1 dave users 153581 Jun  8 15:00 foobig.txt.bz2
-rw-r--r-- 1 dave users  98761 Jun  8 15:01 rec00001foobig.txt.bz2
-rw-r--r-- 1 dave users  54834 Jun  8 15:01 rec00002foobig.txt.bz2

Neat! The "incomplete" block 3 seems to have been a false alarm since the recovered second block contained the entire rest of the document.

Conclusion

In true Unix fashion, this is a tool that does one thing and seems to do it well. I’ll let you decide its comparative merits vs. the other available compression tools.

At the very least, this project contains great documentation and seems to be quite complete.

Until next time, happy hacking, Slackers!