Amanda-Users

Re: large dumps - 2.4.2

2007-03-22 14:07:16
Subject: Re: large dumps - 2.4.2
From: Gene Heskett <gene.heskett AT verizon DOT net>
To: amanda-users AT amanda DOT org
Date: Thu, 22 Mar 2007 10:52:16 -0400
On Thursday 22 March 2007, Jurgen Pletinckx wrote:
>Thanks for the comments, all. I will do a few more dumps
>to see what comes out, before committing to a split of
>these DLEs.
>
>some specific points:
><Gene Heskett>
>
>| I might also add that 2.4.2 is very dusty these days, and it might
>
>not
>
>| hurt to bring it up to one of the 2.5.x versions.  2.4.2 has had
>
>many
>
>| years for bit rot to set in now, and it might be doing something a
>
>wee
>
>| bit differently than the current versions are.
>
>Frankly, I'm afraid to do that. This is oldish hardware and OS, and
>I'd rather not break things by trying to fix them. Or something.
>(IRIX 6.5 on SGI Origin 200. Very easy on the eye.)
>
><Jon LaBadie>
>
>| Guessing here.  You are using DLT tape with a 35GB "native" capacity
>| and believe the marketing hype that they are "70GB tapes".
>|
>| Further guessing.  The previous amanda admin is using
>| software compression
>| (gzip) rather than letting the hardware compress things on
>| the fly.  This
>| is very typical and normal.  If so, amanda wants to know the native
>| capacity of the tape and that is what is specified in the
>
>"tapetype",
>
>| setting.  This is probably between 33&35GB, measured with the
>| amtapetype
>| program.  If amanda has a history of these DLE it knows their
>| compressibility.
>| It may be more or less than the frequently claimed 50%.
>|
>| OTOH, if hardware compression is being used, most amanda admins find
>| the 50% compression claim of the drive manufacturer to be
>
>optimistic.
>
>| Thue your admin may have listed the tapetype capacity of the drive
>| as something lower than 70GB.
>
>I'm entirely unaware of marketing hype. Or truth, for that matter.
>This
>is what I saw in amanda.conf:
>
>tapetype DLT-7000
>[snip]
># taken from http://www.cs.columbia.edu/~sdossick/amanda/
>define tapetype DLT-7000 {
>        comment "DLT-IV op DLT-7000 drive"
>        length 33000 mbytes
>        filemark 8 kbytes
>        speed 5 mbytes
>}
>
>Aaaaand you're right. Dunno how I came up with that 70G figure.
>Hrm. Combined with variable compression rates, that would account
>for the [dump larger than tape, but cannot incremental dump
>skip-incr disk] business.
>
>Where would I look for hardware vs software compression?

Look at the disklist entry to get the dumptype. then look the dumptype up 
in the amanda.conf.  And be aware that a dumptype can include another 
previously defined dumptype by including its name.

Generally, we are somewhat opposed to hardware compression done in the 
drive.  The reason is of course that amanda counts bytes going down the 
cable to the drive, after doing her own compression if told to do so.  If 
instead the drive is doing the compression, amanda has only a relatively 
poor idea of how much has been written to the tape.

However, doing compression on the machine, where the machine is its own 
server, is time consuming and cpu intensive.  It is much less of an 
effect on the overall backup times when there are several client machines 
involved because the compression can be offloaded to the clients so that 
all of them can be working in parallel at this job.  Some directory trees 
can be compressed to 8-10% of their original size, and this of course 
reduces the bandwidth required to move the already compressed data over 
the network.  Other directories full of already compressed data, like 
a /usr/dlds-rpms, full of rpms, can only be compressed a few percentage 
points if at all, so those dle's are best handled straight through.  
Ditto for a directory full of mp3 or ogg format music, and, if like me, 
the files that make up a couple of wedding videos are being kept, there 
is little or nothing to be gained by compressing any of those.

Look at the email reports from amanda, generally speaking any dle that 
gets an '80%' or more compression rating isn't gaining you much and could 
be done uncompressed, but those that say 50% or less are being compressed 
to good effect.  My average compression ratio is around 38% here, meaning 
that 100MB is 38MB once compressed.  Amanda tracks this bit of history on 
a by the dle basis as it is used to give amanda a good idea of what this 
directory, which might total 1.7GB raw, will be once compressed, say 
766MB, and uses this data in the planner stage to finetune the schedule 
being used on a per run basis.

I also, in my amanda.conf, tell amanda to put the biggest files on the 
front of the tape, so that the last few are the 64k level 5's where a lot 
of them will fit on a small amount of tape are taped last.  Having a tape 
that is half full already when a file that is more than half a tape is 
attempted to be taped leaves a much larger file sitting in the holding 
disk area to be flushed, in the case of your 2.4.2 version, by hand.  
Amanda got the ability to autoflush, on the next days run, anything 
leftover in the holding disk at about the 2.4.4 or 2.4.5 version, and 
we're now at 2.5.1p3 and counting.

I hope this is helpfull at getting you 'up to speed' on fine tuning 
amanda.  She can be a complex lady at times.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
In practice, failures in system development, like unemployment in Russia,
happens a lot despite official propaganda to the contrary.
                -- Paul Licker

<Prev in Thread] Current Thread [Next in Thread>