Amanda-Users

Re: enabling hardware compression ?

2004-09-03 21:45:00
Subject: Re: enabling hardware compression ?
From: Gene Heskett <gene.heskett AT verizon DOT net>
To: pll+amanda AT permabit DOT com
Date: Fri, 3 Sep 2004 21:41:00 -0400
On Friday 03 September 2004 11:06, pll+amanda AT permabit DOT com wrote:
>In a message dated: Mon, 23 Aug 2004 15:12:52 EDT
>
>Frederic Medery said:
>>Thanks,
>>in fact what I want is to speed up my backup so i moved my tape
>> charger to the biggest server, now I have a /amanda with 160 GB of
>> free space. I thought that HW compression would also speed up my
>> backup.
>
>I find that an illogical assumption.  If you have N clients all
>streaming data to a central host, which then has to stream that data
>to a tape drive, and the tape drive then has to compress that
> stream, then the compression becomes the bottleneck.  By using hw
> compression in that manner, you've inserted an additional hoop for
> the data to jump through before actually getting onto the tape. 
> Therefore, by definition, the whole process will be slowed down
> some (even if only by a negligible amount).
>
That, generally speaking, is not true.  The compression that tape 
drives use is typically some form of RLL, and regular RLL 
encoder/decoder chips right in the drive can handle that without 
breaking a sweat even in weather like this.  They can easily keep up 
with the drives ability to record or play the data, so typically the 
data rate seen at the input can sometimes more than double the data 
going up the short cable to the record/play heads.  Thats why the 
maker always brag about a 2x (or more) speedup when compression is 
turned on.

The downside is that RLL isn't a consistent compressor, and will in 
fact expand the data 5 to 15% if the data input is truely random, 
which is generally the case when running amtapetype so you get 
obviously low results, well below the sales dweebs claims.

In any event, we as a group have tended to frown on the use of the 
hardware compressor because it hides the true capacity of the tape 
from amanda, and amanda then has no idea when it will hit the EOT 
when writing the tape.  If amanda is useing something like gzip to 
compress the data, then amanda knows to the byte how much data has 
been fed to the drive since amanda counts the bytes *after* any 
compression she has told the system to use, and can regularly fill a 
tape to 98% of its rated capacity, every night.  I've done it for 
months at a time when things were working right here.

The sad but true fact of life is that most drives can be filled or 
dumped to the native capacity of its tape in 2 to 3 hours.  If its 
takeing 12 hours to do the backup, then the answer really is a 
bigger, faster drive unless the systems config is well and truely 
broken.

>To actually speed up the backup process, it's probably better to
>perform client side compression, thereby distributing the process
>such that it is performed in parallel on the X clients you have.
>This way, when the data, when it arrives at the server, is already
> in the form it's meant to be on tape; i.e. there's no longer a
> bottleneck at the tape drive.
>
>Just my $.00002 :)

Worth a bit more than that :)

This is also true, clients can, if theres a bunch of them, all be 
doing their compression in parallel.  This is usually controlled by 
the spindle number in your disklist, by giving each individual hard 
drive a unique spindle number.  So once they are done, and theres 
available bandwidth in the local network, they can pour their data 
into the holding disk at the effective bandwidth of the network.

The client of course needs the resources to do that compression, 
primarily sufficient memory to afford gzip a largest disklist entry 
on that client sized piece of memory to play in.  Holding disks on 
the client will IIRC be used for buffering, so thats usefull too.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

<Prev in Thread] Current Thread [Next in Thread>