Amanda-Users

Re: hardware vs software compression (was Re: amflush/amcheck not in sync?)

2003-04-24 04:50:22
Subject: Re: hardware vs software compression (was Re: amflush/amcheck not in sync?)
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Jeroen Heijungs <Jeroen.Heijungs AT Het-Muziektheater DOT nl>
Date: Thu, 24 Apr 2003 10:45:54 +0200
Jeroen Heijungs wrote:
It was recommended to me to NOT use software compression, with
the following reason:

"If a software compressed file is damaged, the complete file/tape is
not readable anymore and therefore useless. If it is NOT compressed
the rest of the tape/file may be readable, and therefore probably
restorable."

I have never had the opportunity to test this, does anyone has some
thoughts and/or comments on this?

Now we are getting down to the misty field of how bytes are put on a
tape.  My knowledge about it is already dated (> 10 years ago!).
Yes, in theory it is indeed true, that a miss in one single bit makes
a complete gzipped file from that point on useless.
But, when using hardware compression, you are just offloading the
compression algorithm to the drive. The drive compresses the bytes block by block. When reading the tape, a single bit error makes the
block from that point on garbage.  Would gnutar or restore do something
reasonable with a block of garbage somewhere in the middle?  I doubt it.
But yes, in theory you have more chance of recovering something beyond
the error (e.g. you could skip for the next fileheader-like structure
in gnutar; you can download programs to help you with these problems).
In practice, make sure you have more than only one backup.

Because single bit errors really are dangerous (would you trust a restore of your bank account data, and know that 1 in 10^9 bits
in the file were flipped?) the tape drives take countermeasures.
Drives write extra bits to check or even correct the bits on tape.
E.g. a DDS-2 drive has an error rate of 1 in 10^17 bits.
That means that you have to write the full 4 Gbyte capacity 3000000 times to get an average of 1 error.

Now that we have the things in perspective, there are other reasons to
choose between hard/software compression.

First the builtin algoritm in most tapedrives behave bad when feeded
with uncompressable data:  the algorithm expands such data (by 10-30%).
The result is that you have LESS tapecapacity. The gzip library when using software compression expands such data by less than 1%. Take this into account, if your data contain many gifs, jpegs, mpegs, mp3, or already compressed files. When using software compression,
you can even let Amanda control which DLE's benefit from compression,
and which don't (saving CPU-cycles).  (And never use both hardware
and software compression at the same time!)

When using hardware compression, Amanda does not have a correct idea
of the extact tape capacity.  You have to take a wild guess about how
much the overall data will compress.  The scheduling algorithm Amanda
uses, work much better, if the approximations are more extact.
E.g. when you have multiple tapes in a single run, amanda can fill
each tape close to 100% (see "taperalgo" directive in amanda-2.4.4).

Of course hardware compression has one big advantage: speed.  If you
have to fit lots of data in a small backup window at night, and your cpu's are not fast enough, or interfere with the nightly cpu-intensive
jobs, then hardware compression is the way to go.


I now use the hardware compression, and not the software compression,
the tapes are big enough, so there is no real problem for the time being.

As long as you don't run near the border cases, you don't need to choose.


--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************



<Prev in Thread] Current Thread [Next in Thread>