Amanda-Users

Re: DAT hardware or software compression

2002-11-22 15:10:22
Subject: Re: DAT hardware or software compression
From: Niall O Broin <niall AT makalumedia DOT com>
To: amanda-users AT amanda DOT org
Date: Fri, 22 Nov 2002 19:26:24 +0000
On Friday 22 November 2002 16:28, Gene Heskett wrote:

> Which should be reason enough to see if the use of bzip2 could be
> incorporated into amanda.  AIUI, bz2 can re-synch, losing only the
> actual file that the error effected.  

Without even looking at the matter I can tell you that this is incorrect 
because a bzipped tar file is simply a compressed stream of data - bzip2 
doesn't have any understanding of the structure of a tar file so if it could 
re-synch it couldn't do so to a granularity of one file in a tar file and 
hence  "lose only the actual file that the error affected".

But enough talk - how about some action. I took a bzipped tar file and made a 
random single bit change in it. Such a corrupted bzipped tar behaves just 
like a corrupted gzipped tar file - ticks along fine until you hit the error 
and then you get tar errors like:

tar: Skipping to next header
tar: Archive contains obsolescent base-64 headers

followed by bzip errors like:

bzip2: Data integrity error when decompressing.
        Input file = (stdin), output file = (stdout)

bzip2 tells you that:

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

so I did that - it produces a number of individual .bz2 files. Each of those 
can be bunzipped (except for the one which contained the error - you get the 
same error message, and the suggestion to use bzip2recover) and you can then 
use cat to glue them back together and throw them through tar like:

cat rec00001file.tar rec00002file.tar . . . .  recnnnnfile.tar | tar tf -

which has about the same effect as simply running tar tjf against the original 
corrupt file - tar is happy until it reaches the missing corrupt section but 
the gap is too much for it to bridge.

> Food for thought, since bz2 can recover a dropped bit or several...

But for our purposes, what it does is unfortunately about as much use as a 
rubber hacksaw blade. So if you're itching to spend some time hacking on 
amanda, adding bzip2 support is not going to help with bit errors on tapes.

But there again, this is why you use enought tapes to always have a couple of 
level 0 to hand - to be sure, to be sure.

The only way to recover from single bit errors is to use an ECC code, which is 
what decent tape drives do at a hardware level anyway. bzip2 or such a 
compression program could of course add in an ECC code but this would make 
the compressed file bigger because TANSTAAFL and as the purpose of a 
compression program is to make files smaller, their authors haven't felt the 
need to add this facility.

Sorry if I've burst anyone's bubble . . .



Kindest regards,


Niall  O Broin