Re: tuning the estimate phase?

On 5/2/06, Paul Bijnens <paul.bijnens AT xplanation DOT com> wrote:

The client runs a "gtar --sparse --totals -f /dev/null --otheropts...".
No piping through gzip, no transfer over the network.
Gnutar itself has special code for handling output to /dev/null, and
doesn't even read the files in that case (unless the stat() indicates it
is a sparse file, for which it depends on the version of gtar how it
handles that -- some versions read sparse files.).
Doing a stat() for each file/directory of the filesystem can be
stressing the server yes indeed.

Sidemark:  because the output is not piped through gzip, Amanda can
only guess how much it will compress.  Therefor it builds up a history
of compression rates for each DLE.  The default assumed compression
rate for a new DLE (without history) can be tuned by the amanda.conf
parameter "comprate".

Since I'm using no-compress (I'm using hw compression on the drive), does amanda ignore even the default compression rate in this case?

> Since the entire array is a single file system, even the backup of
> individual hierarchies seems to result in this blocking.
>
> Does this sound like a reasonable theory? If so, is there a way I can
> tune the estimation to be "nicer" ?

Avoid doing running multiple gtar processes at the same time
by specifying the "spindle" in the disklist.

I am specifying spindle in the disklist, and have my user "partition" specified thusly:

space-monster /u1/user/ad       /u1/user          {
   user-high-tar
   include "./[a-d]*"
   include append "./[A-D]*"
} 1

Other "partitions" follow for e-h, i-l, etc. all with a spindle of 1.

Are you sure it happens during estimate?

Yes. We've been keeping logs of when our NFS clients experience the timeouts and correlated them to the times when the estimates are running our our NFS server. All the NFS timeouts occur during the estimates, and as far as I know, we never see timeouts during the actual dump across the network back to the amanda host.

Another possibility is to revert to faster/less accurate estimate
strategies: "calcsize" is faster (but if stat() is indeed the
problem, this will not help much).
There is also a only statistically based estimate, see:

http://wiki.zmanda.com/index.php/Amdump:_results_missing#Timeout_during_estimate.3F

Thanks, I'll take a look.