Amanda-Users

Re: sendsize finishes, planner doesn't notice...

2007-10-04 09:21:31
Subject: Re: sendsize finishes, planner doesn't notice...
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: Paul Lussier <pll+amanda AT permabit DOT com>
Date: Thu, 04 Oct 2007 09:20:41 -0400
Paul Lussier wrote:
Hi all,

I'm using amanda 2.5.1p1-2.1 from Debian/stable.

I have several file systems which take hours to estimate and dump.
My amanda.conf contains:

  etimeout  10800 # 3 hours
  dtimeout   7200 # 2 hours
  ctimeout     30

My sendsize log reports the following:

  $ egrep "estimate (time|size) for" sendsize.20071003113105.debug \
  |grep '/permabit/user'|sort
  ...
  sendsize[8132]: estimate size for /permabit/user/eh level 0: -1 KB
  sendsize[8132]: estimate time for /permabit/user/eh level 0: 18804.285
  sendsize[8136]: estimate size for /permabit/user/il level 0: 470515080 KB
  sendsize[8136]: estimate time for /permabit/user/il level 0: 33523.568
  sendsize[8137]: estimate size for /permabit/user/mp level 0: 388366900 KB
  sendsize[8137]: estimate time for /permabit/user/mp level 0: 31830.040
  sendsize[8144]: estimate size for /permabit/user/qt level 0: 438384190 KB
  sendsize[8144]: estimate time for /permabit/user/qt level 0: 33232.123
  sendsize[8151]: estimate size for /permabit/user/uz level 0: 502958220 KB
  sendsize[8151]: estimate time for /permabit/user/uz level 0: 33453.437
  sendsize[8301]: estimate size for /permabit/user/assar level 0: 169842670 KB
  sendsize[8301]: estimate time for /permabit/user/assar level 0: 15124.977

I'm assuming that the number which is not in KB is in seconds.  Which
means that the lowest one of these took over 5 hours to complete, and
I need to increase both (e,d)timeout to at least 9 hours to accomodate
the highest of these.

The strange thing is that all these estimates *did* complete from what
I can tell in the sendsize log.  Yet the planner doesn't seem to think
they have:

  $ amstatus offsite | grep getting
  amanda2:/permabit/release                  getting estimate
  amanda2:/permabit/user/eh                  getting estimate
  amanda2:/permabit/user/il                  getting estimate
  amanda2:/permabit/user/mp                  getting estimate
  amanda2:/permabit/user/qt                  getting estimate
  amanda2:/permabit/user/uz                  getting estimate

I *assume* it's because of the timeout bug in amanda 2.5.1:

  $ amadmin offsite config | grep -i timeout
  ETIMEOUT              220000
  DTIMEOUT              210000
  CTIMEOUT              190030

Which seems to indicate that planner is going to sit aroud for 61+
hours waiting for estimates to show up ?  What I'm not quite certain
of though, is why doesn't planner notice that these DLEs have
completed?  It noticed all the other DLEs have completed their
estimate phase, so why not these?

Is there something in the logs I can look for to determine how planner
notices that sendsize has completed for a given DLE?

Look at the amdump log file, it list all estimate received from the clients.

Are you sure the sendsize debug file you look at is the correct one?
sendsize will continue even after an estimate timeout.