Amanda-Users

Re: Amanda Compression

2004-09-23 10:31:56
Subject: Re: Amanda Compression
From: Gene Heskett <gene.heskett AT verizon DOT net>
To: kshriyan AT redhat DOT com
Date: Thu, 23 Sep 2004 10:28:12 -0400
On Thursday 23 September 2004 04:25, Kaushal Shriyan wrote:
>Hi
>
>I have a 40GB/80GB HP VS80 DLT Tape Drive. Now amanda is only able
> to recognize 40 GB only how do i make it enable to utilize the 80GB
> space on the tape. Is there any way out ???
>
>Can any one help me please

That second figure is market-speak for how much it can hold if 2 
things are assumed:  1) the data is compressable and 2) the hardware 
compressor is turned on.  Then it can hold about 2x its native 
capacity if you are counting the bytes fed down the cable to the 
drive.

But there are several flies in that soup as far as amanda is 
concerned.

1.  Amanda can pipe the data through gzip selectively, meaning it 
won't waste any time & horsepower trying to compress data that won't 
compress, like most executables, or a stash of tar.gz and rpms.

2.  By using gzip, the compression ratio for quite a bit of the 
average data can approach 90%, the end result being that judicious 
use of gzip can beat the hardware compressor's about 50% quite 
nicely.

3.  Amanda DOES count bytes fed down the cable, not that of the data 
before gzip.  When the hardware compressor is off, then amanda knows 
exactly how much data she can pour down the cable before the EOT 
signal/error comes back.  This means that amanda can actually beat 
the hardware compressors advertised capacity, often by quite useable 
amounts of data.  Generally speaking, the achieved compression here 
is in the mid 30's to the low 40's, meaning that amanda could put 
close to 100GB of raw data on that 40GB tape.  My own setup here is 
now to disk rather than tape, and as a disk file can expand to the 
free space capacity of the drive without error, then amanda must use 
the tapetype set sizeing to adjust the schedule for balanceing.

>From a recent email report here:
Output Size (meg)        3797.8     3455.8      342.0
Original Size (meg)      7421.5     6641.9      779.6
Avg Compressed Size (%)    41.6       41.7       40.8   
(level:#disks ...)

But amanda is just getting started again here, so I expect things to 
settle down to a bit better compression ratio in due time.

4.  Using the hardware compressor causes the true capacity of the tape 
to be hidden from amanda since the bytes sent down the cable aren't 
1/1 with the bytes on the tape, so amanda doesn't really know for 
sure how big the tape is.  One has to apply fudge factors to the 
tapesize set in the tapetype.

So, generally speaking, the oldtimers using amanda rarely use the 
drives hardware compressor, and recommend against using it if the 
machines have the cpu horsepower to run gzip.  Bear in mind also that 
the gzipping can be done on the clients so that several clients can 
all be running gzip at the same time, thereby offloading that job 
from the single server machine rather nicely.  Just be sure and give 
each physical disk its own 'spindle' number in the disklist and 
amanda can handle the rest of that if using a dumptype that includes 
the phrase 'compress client (fast or best)'

Note too, that a tape once written in the hardware compressed mode 
sets flags on the tape in the leader packet the drive uses to 
recognize the tape when its inserted into the drive.  Its difficult 
to turn off once on because the drive overrides the your choice if it 
finds a tape that says its compressed.  So one has to turn it off 
again using mt, and then write enough trash data to the tape to cause 
the drive to flush its buffers, at which point that flag will finally 
be re-written, and therefore "uncompressing" that tape.  This 
workaround must be done to every tape thats ever been written to with 
the hardware compressor turned on.

Hopefully this explains why we don't recommend using it, amanda can 
usually beat it, at the cost of a few btu's from the cpu and a bit of 
time.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.26% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


<Prev in Thread] Current Thread [Next in Thread>