Amanda-Users

Re: invalid compressed data--crc error and other corruption on disk files

2005-02-18 11:36:49
Subject: Re: invalid compressed data--crc error and other corruption on disk files
From: Eric Siegerman <erics AT telepres DOT com>
To: amanda mailing list <amanda-users AT amanda DOT org>
Date: Fri, 18 Feb 2005 11:30:04 -0500
On Fri, Feb 18, 2005 at 11:36:46AM +0000, Thomas Charles Robinson wrote:
> [an excellently clear, concise, and complete [1] problem report
> -- thank you! -- which included the following:]
>
> gzip: stdin: invalid compressed data--crc error

All of tar's varied complaints appear to stem from corrupt input,
which in turn is adequately explained by this message.

Thus, either gzip or hardware looks like the culprit.  RAM is a
good place to look, especially considering that the data being
backed up all resides on the Amanda server; you're giving that
box quite a workout.  The disk and its bus (SCSI, IDE, etc.) are
possibilities too, but less likely IMO -- I'd expect the kernel
to detect and report the I/O errors in that case.

Not to completely rule out problems with Amanda itself -- I've
learned never to rule *anything* out where computers are
concerned (or humans for that matter :-/) -- but it seems
unlikely.

As for gtar, 1.13.25 is well regarded on this list.  'Nuff said,
until its input is known to be good.  (After all, even if,
hypothetically, tar were producing complete junk, gzip should be
able to compress and decompress that junk without reporting CRC
errors :-)

> gzip-1.3.3-9

... is a beta.  It might be worthwhile to try the latest released
version, 1.2.4.  From the web page, it looks as though that
version can't handle files over 2 GB, so you'll have to split up
any larger DLEs.  Or just disable them for the duration of the
test -- no loss; it's not as if you have usable backups of them
now :-( 

Another useful test would be to temporarily disable software
compression completely.  That should fairly quickly tell you
whether the corruption is occurring during gzipping (whether gzip
itself or hardware is the ultimate source of the problem).

> Lastly, I am currently using an nfs share for the holding disk but this
> was NOT being used previously and I was still getting the corruption
> mentioned.

Hmm, did you ever run with local holding disk, while explicitly
testing holding-disk files as you're doing now?  I.e. was there
ever a point where neither NFS nor the tape drive was in the
loop?  I'm wondering about the possibility that two independent
sources of data corruption -- NFS and the tape subsystem -- might
be confounding your attempts to isolate "the" problem.

--

|  | /\
|-_|/  >   Eric Siegerman, Toronto, Ont.        erics AT telepres DOT com
|  |  /
The animal that coils in a circle is the serpent; that's why so
many cults and myths of the serpent exist, because it's hard to
represent the return of the sun by the coiling of a hippopotamus.
        - Umberto Eco, "Foucault's Pendulum"