Amanda-Users

Re: DAT hardware or software compression

2002-11-22 06:53:43
Subject: Re: DAT hardware or software compression
From: Paul Bijnens <paul.bijnens AT lant DOT com>
To: amanda-users AT amanda DOT org
Date: Fri, 22 Nov 2002 12:01:21 +0100
Sven Kirmess wrote:

If there is one flipped bit (read error), everything compressed with gzip
after that position is lost. With HW compression you can continue restoring
after that position plus some overhead.

This comes up now and then, but I have my doubts about it.  It assumes
that on the tape are only your databits, and there is no error detection
on the tape.  I was once told that a DDS-tape device works as follows:
The bits "on the tape" do have some error correction (one-bit correction) and error-detection (two-bit or more) encoding.
So when reading any tape the hardware can detect those errors.

When writing to tape, there is a read-head just after the write-head,
and the hardware detects any block with errors while writing.  If a
read head detects an error, the block is written again, without stopping the tape streaming, up to 15 times (I believe). If a certain treshold is of rewriting blocks is passed (e.g. 5 times or so), the "clean drive" led is lit on the panel. In the syslogs you can see this as "soft errors". If the block cannot be written without errors in the 15 tries, it is a hard error and the drive returns a "tape
error" status.

When reading a tape, it can detect errors with the bit-stuffing algorithms. When it detects an error (a one-bit error is automatically corrected!), and the next block is not the same again (rewritten during the write phase), the tape has to stop streaming, reposition and try again up to a few times. If it succeeds you can see a "soft error" in the syslogs, or if the error persists, it is flagged as a "hard error" and the drive returns a tape error.
I have surely watched a DDS-2 drive try (and succeed!) reading such
marginally bad blocks: you can hear it stop/reposition/retry, and you
see the errors in the syslog.

I searched the net for a decent technical paper (I once read it somewhere), but I can't find it again.

I would be glad if some hardware specialist could confirm or deny this.

Now assuming the above is correct, how can HW compression ameliorate
this situation? I have experience doing tape backups (and now and then do restores too) for almost 20 years. I had my load of problems with
restoring, but I've never had a situation where I could read the bits
on the tape, but the extraction program (restore, tar, cpio...) claiming
that the archive was corrupt due to some arbitrary bit-flip somewhere in the middle of the stream. When I've an error, it was always at the end (premature eof etc), never in the middle. Incompatible versions of restore programs, buggy version of restore or backup programs, yes, and I've worked around all of those up to now, but I've never seen a bit-flip -- or maybe I'm just lucky, or too stupid to notice one if it passes :-).

I was once told too, that in PC-hardware, there are about 3 single-bit errors/years in RAM on your PC. And because most PC don't have error-detecting memory they go unnoticed (crashes due to real software bugs are still far more common). This said, I'm still amazed by the oldest uptime of some Linux server I run: 869 days already (Kernel 2.2.13 on an old cheap Compaq PC). I must be lucky.


I use HW compression as long as the backup utility can not gzip every file
independantly or use an algorithm that can recover after a read error.


--
Paul Bijnens, Xplanation                           Tel  +32 16 40.51.40
Interleuvenlaan 15 H, B-3001 Leuven, BELGIUM       Fax  +32 16 40.49.61
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************