Re: tuning the estimate phase?

On 2006-05-02 16:22, Paul Lussier wrote:

Hi all,

Is it possible to tune the estimate phase of a backup run?  We appear
to be getting NFS timeouts experienced by our NFS clients during the
estimate phase when the NFS server is getting backed up.

The going theory is this that during the estimation phase, amanda is
doing a gtar|gzip -c >/dev/null.  And, as we all know, the bandwidth
of /dev/null is damn near impossible to beat :)

During the actual dumping of data, the gtar|gzip is getting sent back
across the wire, and therefore gtar gets constrained by the bandwidth
of the network, which even at GigE is significantly lower than that of
/dev/null.  As a result, during the estimation phase, amanda is taking
over the disk IO to the RAID array and the NFS daemons are competing
for r/w access.


So far the theory  :-) . The reality is:

The client runs a "gtar --sparse --totals -f /dev/null --otheropts...".
No piping through gzip, no transfer over the network.
Gnutar itself has special code for handling output to /dev/null, and
doesn't even read the files in that case (unless the stat() indicates it
is a sparse file, for which it depends on the version of gtar how it
handles that -- some versions read sparse files.).
Doing a stat() for each file/directory of the filesystem can be
stressing the server yes indeed.

Sidemark:  because the output is not piped through gzip, Amanda can
only guess how much it will compress.  Therefor it builds up a history
of compression rates for each DLE.  The default assumed compression
rate for a new DLE (without history) can be tuned by the amanda.conf
parameter "comprate".


Since the entire array is a single file system, even the backup of
individual hierarchies seems to result in this blocking.

Does this sound like a reasonable theory? If so, is there a way I can
tune the estimation to be "nicer" ?


Avoid doing running multiple gtar processes at the same time
by specifying the "spindle" in the disklist.

Are you sure it happens during estimate?
Another possibility is to revert to faster/less accurate estimate
strategies:  "calcsize" is faster (but if stat() is indeed the
problem, this will not help much).
There is also a only statistically based estimate, see:

http://wiki.zmanda.com/index.php/Amdump:_results_missing#Timeout_during_estimate.3F


--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************