Sven Kirmess wrote:
If there is one flipped bit (read error), everything compressed with gzip
after that position is lost. With HW compression you can continue restoring
after that position plus some overhead.
This comes up now and then, but I have my doubts about it. It assumes
that on the tape are only your databits, and there is no error detection
on the tape. I was once told that a DDS-tape device works as follows:
The bits "on the tape" do have some error correction (one-bit
correction) and error-detection (two-bit or more) encoding.
So when reading any tape the hardware can detect those errors.
When writing to tape, there is a read-head just after the write-head,
and the hardware detects any block with errors while writing. If a
read head detects an error, the block is written again, without stopping
the tape streaming, up to 15 times (I believe).
If a certain treshold is of rewriting blocks is passed (e.g. 5 times or
so), the "clean drive" led is lit on the panel. In the syslogs you can
see this as "soft errors". If the block cannot be written without
errors in the 15 tries, it is a hard error and the drive returns a "tape
error" status.
When reading a tape, it can detect errors with the bit-stuffing
algorithms. When it detects an error (a one-bit error is automatically
corrected!), and the next block is not the same again (rewritten during
the write phase), the tape has to stop streaming, reposition and try
again up to a few times. If it succeeds you can see a "soft error" in
the syslogs, or if the error persists, it is flagged as a "hard error"
and the drive returns a tape error.
I have surely watched a DDS-2 drive try (and succeed!) reading such
marginally bad blocks: you can hear it stop/reposition/retry, and you
see the errors in the syslog.
I searched the net for a decent technical paper (I once read it
somewhere), but I can't find it again.
I would be glad if some hardware specialist could confirm or deny this.
Now assuming the above is correct, how can HW compression ameliorate
this situation? I have experience doing tape backups (and now and then
do restores too) for almost 20 years. I had my load of problems with
restoring, but I've never had a situation where I could read the bits
on the tape, but the extraction program (restore, tar, cpio...) claiming
that the archive was corrupt due to some arbitrary bit-flip somewhere in
the middle of the stream. When I've an error, it was always at the end
(premature eof etc), never in the middle. Incompatible versions of
restore programs, buggy version of restore or backup programs, yes, and
I've worked around all of those up to now, but I've never seen a
bit-flip -- or maybe I'm just lucky, or too stupid to notice one if it
passes :-).
I was once told too, that in PC-hardware, there are about 3 single-bit
errors/years in RAM on your PC. And because most PC don't have
error-detecting memory they go unnoticed (crashes due to real software
bugs are still far more common). This said, I'm still amazed by the
oldest uptime of some Linux server I run: 869 days already (Kernel
2.2.13 on an old cheap Compaq PC). I must be lucky.
I use HW compression as long as the backup utility can not gzip every file
independantly or use an algorithm that can recover after a read error.
--
Paul Bijnens, Xplanation Tel +32 16 40.51.40
Interleuvenlaan 15 H, B-3001 Leuven, BELGIUM Fax +32 16 40.49.61
http://www.xplanation.com/ email: Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|