Amanda-Users

Re: sendsize finishes, planner doesn't notice...

2007-10-11 16:27:47
Subject: Re: sendsize finishes, planner doesn't notice...
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: Paul Lussier <pll+amanda AT permabit DOT com>
Date: Thu, 11 Oct 2007 16:16:08 -0400
Paul Lussier wrote:
Jean-Louis Martineau <martineau AT zmanda DOT com> writes:

It's weird.

Do you have an amdump log file or just amdump.1?
The only way to get this is if you killed amanda process on the
server, maybe a server crash.
Do you still have amanda process running on the server?

I do now. I started amanda off Tuesday night at "Tue Oct  9 22:48:34 2007".

According the /var/log/amanda/amandad/amandad.20071009224834.debug file:

  amandad: time 21604.147: pid 26218 finish time Wed Oct 10 04:48:39 2007

According to sendsize.20071009224835.debug:

amanda2:/var/log/amanda/client/offsite# tail sendsize.20071009224835.debug errmsg is /usr/local/libexec/runtar exited with status 1: see /var/log/amanda/client/offsite/sendsize.20071009224835.debug
sendsize[26687]: time 37138.237: done with amname /permabit/user/uz dirname 
/permabit/user spindle -1
sendsize[26379]: time 37823.330: Total bytes written: 541649408000 (505GiB, 
14MiB/s)
sendsize[26379]: time 37823.453: .....
sendsize[26379]: time 37823.453: estimate time for /permabit/user/eh level 0: 
37823.251
sendsize[26379]: time 37823.453: estimate size for /permabit/user/eh level 0: 
528954500 KB
sendsize[26379]: time 37823.453: waiting for runtar "/permabit/user/eh" child
sendsize[26379]: time 37823.453: after runtar /permabit/user/eh wait
errmsg is /usr/local/libexec/runtar exited with status 1: see 
/var/log/amanda/client/offsite/sendsize.20071009224835.debug
sendsize[26379]: time 37823.537: done with amname /permabit/user/eh dirname 
/permabit/user spindle -1

So, sendsize claims to be done, yet planner doesn't think so:
sendsize doesn't claims to be done, I don't see the "finish time' line at the of the log.
Is it still running?
  planner: time 16531.383: got partial result for host amanda2 disk \
     /permabit/user/uz: 0 -> -2K, -1 -> -2K, -1 -> -2K
  [...]
  planner: time 16531.384: got partial result for host amanda2 disk \
     /permabit/user/eh: 0 -> -2K, -1 -> -2K, -1 -> -2K

amdump is currently still running, amandad has finished, but we're
still waiting for estimates which will never arrive.

I also find it disturbing that the debug log I'm looking at,
sendsize.20071009224835.debug, tells me to look at the log I'm looking
at for further information:
errmsg is /usr/local/libexec/runtar exited with status 1: see \
    /var/log/amanda/client/offsite/sendsize.20071009224835.debug

Any idea why amandad is dying before sending the estimate data back to
the planner?  My etimeout is currently set to:
amandad didn't dye, its log show it finished correctly.
Am I missing something extremely obvious?  I've been using amanda for
over a decade, and I can't figure out why she's behaving like this.

If there's any more information you need in order to help me figure
this out, please let me know, the suspense here is killing me :)
If sendsize is not running, it's because it crashed.

I don't understand why amandad finish before sendsize, can you post complete amandad and sendsize debug files.

Jean-Louis