Amanda-Users

Re: DAT hardware or software compression

2002-11-20 10:36:10
Subject: Re: DAT hardware or software compression
From: Gene Heskett <gene_heskett AT iolinc DOT net>
To: marc.bigler AT day DOT com, amanda-users AT amanda DOT org
Date: Wed, 20 Nov 2002 09:55:01 -0500
On Wednesday 20 November 2002 08:13, marc.bigler AT day DOT com wrote:
>Hello,
>
>I have got here a DDS-3 tape drive which has per default hardware
>compression enabled and was wondering what is the best deal with
> AMANDA. Would you guys suggest hardware compression or should I
> disable hardware compression and have software compression done
> for example with gzip ?
>
Generally speaking Marc, its a bad idea to use the drives 
compressor.

1st reason is that it hides the true capacity of the tape from 
amada, who counts bytes *sent* to the drive after any compression 
amanda does.  Since data can be compressed quite a bit, but 
executables and such as the tar.gz and rpm archives are generally 
already smunched, the drives compressor can't do much with them and 
may in fact expand them somewhat.  If you compile and run the 
tape-src/tapetype program, you'll see almost the advertised 
capacity of the drive if compression is off, but with it on, the 
data from /dev/urandom which tapetype uses for a data source when 
doing this test, isn't compressible and will probably expand, 
making tapetype give a falsely low value for its size response.

2nd reason is that gzip can usually out-compress the drives hardware 
RLL encoding, and usually by pretty obviously detectable amounts 
except in cases similar to the archive files that are already 
smunched.

For example, if you do as I do, nearly all downloads go into one 
directory, and this directory doesn't get compressed since its a 
waste of cpu cycles to do so.  I have several others in my disklist 
that also skip the compression.  And I have in the past rx'd mail 
from amdump indicating its used 3.5 gb of a 4 gb tape, and has 
stored over 6.5gb of source data to it.

Read the emails from amanda after each run, and any entry in the 
disklist that gets a level 0, and indicates a compression ratio 
>100% should have the dumptype changed to one without compression, 
its not further compressible.  Level 1's and 2's that expand to 320 
or 640 % are probably empty dirs, and can probably be deleted from 
both the drive and the disklist.  Here they just waste 64 blocks of 
tape.

Some entries in the disklist will squeeze down to <25% of their 
original size, so its an overall plus to use gzip IF you have the 
cpu horsepower to do it in a reasonable time frame.  Here I find 
with a 1400mhz clocked athlon, that even though I'm using 
server-best, finished output from gzip piles up in the holding disk 
waiting to be written to tape since the tape can only do about 
375kb a second.  Once started, the drive never stops till the run 
is done.  Under those conditions, the cpu effort to do the 
compression is free (except for its impact on seti@home). :-)

Some DDS drives keep a hidden header (the MRS system) on the tape 
that records the compressors status.  A tape once written with the 
compressor on will be compressed forever regardless of the dip 
switch status unless you forcibly write an amount of data large 
enough to cause a buffer flush in the drive after turning off the 
compression with mt or similar.  I've posted a short script to do 
that several times here.

-- 
Cheers Marc, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.19% setiathome rank, not too shabby for a WV hillbilly