Amanda-Users

RE: Problem backing up just a few machines

2004-05-27 09:15:06
Subject: RE: Problem backing up just a few machines
From: "Henson, George Mr JMLFDC" <George.Henson AT DET.AMEDD.ARMY DOT MIL>
To: "'amanda-users AT amanda DOT org'" <amanda-users AT amanda DOT org>
Date: Thu, 27 May 2004 09:12:09 -0400

> Henson, George Mr JMLFDC wrote:
>
> > logfun02   / lev 0 FAILED [Request to logfun02 timed out.]
> >
> > Then the next day I had two hosts with the same error. In
> > trying to fix the issue I found if I removed the curinfo directories
> > for these two hosts, the backups would run the next time amdump was
> > called. It is always the same two hosts which fail like this.
> >
> > After reviewing the log files, I see the client does not
> > report it received the sendsize service request.
>
> Increase etimeout.  (probably)

Currently etimeout is 300. Should this be increased

> > Why does deleting the curinfo directories "correct" the problem?
>
> In that case amanda thinks that DLE is completely new, and schedules
> a level 0.  It does not bother to estimate how much data incremental
> level 1 would be.
> The next day(s) the planner needs estimates for the level 0, the last
> incremental level, and the lastincr+1. Doing 2 or 3 estimates for each
> DLE on that host takes longer than only level 0...

One part of the mystery solved.

> Just speculating of course.  Seeing logfiles
> (sendsize.XXX.debug on the
> client, and amdump.X on the server) and config files would help.

There is no sendsize.XXX.debug log. This is one of things making me thing the server
can/does not send a send size request. :(

amdump log:
<snip>

planner: time 0.414: setting up estimates for logfun02:/
logfun02:/ overdue 13 days for level 0
setup_estimate: logfun02:/: command 0, options:
    last_level 1 next_level0 -13 level_days 2
    getting estimates 0 (28420) 1 (583) -1 (-1)

<snip>

planner: time 30.790: error result for host logfun02 disk /: Request to logfun02 timed out.

<snip>

If I am reading the above messages correctly the timeout is happening well within
the etimeout window. And the 30 second time frame almost sounds like a network
timeout threshold

<Prev in Thread] Current Thread [Next in Thread>