Amanda-Users

Re: sendsize finishes, planner doesn't notice...

2007-10-04 11:18:58
Subject: Re: sendsize finishes, planner doesn't notice...
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: Paul Lussier <pll AT permabit DOT com>
Date: Thu, 04 Oct 2007 11:18:17 -0400
Paul Lussier wrote:
Jean-Louis Martineau <martineau AT zmanda DOT com> writes:

Look at the amdump log file, it list all estimate received from the clients.

Are you sure the sendsize debug file you look at is the correct one?
sendsize will continue even after an estimate timeout.

It was the only sendsize log at the time, and it hadn't been updated
since 20:00ish last night.  Additionally, all the gnutar processes on
the client had exited, along with the amandad controlling process.
In the sendsize log, I see lots of things like:

sendsize[8151]: time 0.814: calculating for amname /permabit/user/uz, dirname 
/permabit/user, spindle -1
sendsize[8151]: time 0.814: getting size via gnutar for /permabit/user/uz level 0
sendsize[8151]: time 0.816: spawning /usr/lib/amanda/runtar in pipeline
sendsize[8151]: argument list: runtar offsite /bin/tar --create --file 
/dev/null --directory /permabit/user --one-file-system --listed-incremental 
/var/lib/amanda/gnutar-lists/amanda2_permabit_user_uz_0.new --sparse 
--ignore-failed-read --totals --exclude-from 
/tmp/amanda/sendsize._permabit_user_uz.20071003113106.exclude .
sendsize[8151]: time 17808.869: /bin/tar: 
./mfortson/dev/mfortson-prodtest/main/src/java/server: file changed as we read 
it
sendsize[8151]: time 33454.109: Total bytes written: 515029217280 (480GiB, 
15MiB/s)
sendsize[8151]: time 33454.253: .....
sendsize[8151]: estimate time for /permabit/user/uz level 0: 33453.437
sendsize[8151]: estimate size for /permabit/user/uz level 0: 502958220 KB
sendsize[8151]: time 33454.253: waiting for runtar "/permabit/user/uz" child
sendsize[8151]: time 33454.253: after runtar /permabit/user/uz wait
sendsize[8151]: time 33454.359: done with amname /permabit/user/uz dirname 
/permabit/user spindle -1

Which I interpreted, perhaps incorrectly, to mean that sendsize had
communicated this estimate back to the planner.
Not necessarily, check the amandad.*.debug files to know if the client sent the estimate.
  In the amdump log, I can see:

amanda2:/permabit/user/uz overdue 13790 days for level 0
setup_estimate: amanda2:/permabit/user/uz: command 0, options: none    
last_level -1 next_level0 -13790 level_days 0    getting estimates 0 (-2) -1 
(-2) -1 (-2)
...
planner: time 18804.756: got partial result for host amanda2 disk /permabit/user/uz: 0 
-> -2K, -1 -> -2K, -1 -> -2K
The planner never receive the estimate.
So, the last time planner heard from sendsize was at 18804.756, yet
sendsize actually finished at time 33454.359. This was around 20:49
last night.  When I checked it this morning at about 08:30, almost 12
hours later, planner was still waiting for sendsize.  Yet, on the
client, there were no gnutar processes left, nor was there a amandad
process.  Everything on the client was completely quiescent.

Am I missing something here?  Will the server contact the client again
later via amandad to gather any estimates which have been written to
the log since the last time?  I didn't think the server polled the
clients this way.  I always thought a session was established via
amandad and that sendsize or whatever fed data directly back to amdump
via that amandad process.
You are right.

Can you send me the complete amdump log file, planner.*.debug, amandad.*.debug and sendsize.*.debug?

Jean-Louis