On Friday 03 September 2004 11:06, pll+amanda AT permabit DOT com wrote:
>In a message dated: Mon, 23 Aug 2004 15:12:52 EDT
>
>Frederic Medery said:
>>Thanks,
>>in fact what I want is to speed up my backup so i moved my tape
>> charger to the biggest server, now I have a /amanda with 160 GB of
>> free space. I thought that HW compression would also speed up my
>> backup.
>
>I find that an illogical assumption. If you have N clients all
>streaming data to a central host, which then has to stream that data
>to a tape drive, and the tape drive then has to compress that
> stream, then the compression becomes the bottleneck. By using hw
> compression in that manner, you've inserted an additional hoop for
> the data to jump through before actually getting onto the tape.
> Therefore, by definition, the whole process will be slowed down
> some (even if only by a negligible amount).
>
That, generally speaking, is not true. The compression that tape
drives use is typically some form of RLL, and regular RLL
encoder/decoder chips right in the drive can handle that without
breaking a sweat even in weather like this. They can easily keep up
with the drives ability to record or play the data, so typically the
data rate seen at the input can sometimes more than double the data
going up the short cable to the record/play heads. Thats why the
maker always brag about a 2x (or more) speedup when compression is
turned on.
The downside is that RLL isn't a consistent compressor, and will in
fact expand the data 5 to 15% if the data input is truely random,
which is generally the case when running amtapetype so you get
obviously low results, well below the sales dweebs claims.
In any event, we as a group have tended to frown on the use of the
hardware compressor because it hides the true capacity of the tape
from amanda, and amanda then has no idea when it will hit the EOT
when writing the tape. If amanda is useing something like gzip to
compress the data, then amanda knows to the byte how much data has
been fed to the drive since amanda counts the bytes *after* any
compression she has told the system to use, and can regularly fill a
tape to 98% of its rated capacity, every night. I've done it for
months at a time when things were working right here.
The sad but true fact of life is that most drives can be filled or
dumped to the native capacity of its tape in 2 to 3 hours. If its
takeing 12 hours to do the backup, then the answer really is a
bigger, faster drive unless the systems config is well and truely
broken.
>To actually speed up the backup process, it's probably better to
>perform client side compression, thereby distributing the process
>such that it is performed in parallel on the X clients you have.
>This way, when the data, when it arrives at the server, is already
> in the form it's meant to be on tape; i.e. there's no longer a
> bottleneck at the tape drive.
>
>Just my $.00002 :)
Worth a bit more than that :)
This is also true, clients can, if theres a bunch of them, all be
doing their compression in parallel. This is usually controlled by
the spindle number in your disklist, by giving each individual hard
drive a unique spindle number. So once they are done, and theres
available bandwidth in the local network, they can pour their data
into the holding disk at the effective bandwidth of the network.
The client of course needs the resources to do that compression,
primarily sufficient memory to afford gzip a largest disklist entry
on that client sized piece of memory to play in. Holding disks on
the client will IIRC be used for buffering, so thats usefull too.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
|