I had a problem with one of my amanda servers not hearing back
from one of the clients during estimate phase. While it was my
fault (for non-amanda reasons I'd started amdump during daylight
hours when the machine was more heavily loaded).
I was able to get a very good idea of how long it too the estimate
phase to complete by examining the /tmp/amanda files on the client.
I'm not recommending this as a practice, but as long as the info
is in the log files... The estimate phase completes when the last
child terminates - AFAIK, YMMV
[samar] /tmp/amanda 104> grep termin sendsize.20050516210006.debug
sendsize[5891]: time 6.609: child 6150 terminated normally
sendsize[6146]: time 7.461: asking killpgrp to terminate
sendsize[6146]: time 23.051: asking killpgrp to terminate
sendsize[5891]: time 24.063: child 6146 terminated normally
sendsize[6149]: time 30.378: asking killpgrp to terminate
sendsize[6149]: time 68.221: asking killpgrp to terminate
sendsize[5891]: time 69.232: child 6149 terminated normally
sendsize[5891]: time 143.366: child 6175 terminated normally
sendsize[5891]: time 300.214: child 6186 terminated normally
sendsize[5891]: time 356.362: child 6207 terminated normally
sendsize[5891]: time 1341.432: child 6160 terminated normally
sendsize[5891]: time 1796.592: child 6166 terminated normally
sendsize[5891]: time 1797.537: child 6364 terminated normally
sendsize[5891]: time 1820.820: child 6367 terminated normally
sendsize[5891]: time 1820.944: child 6376 terminated normally
sendsize[5891]: time 1924.346: child 6216 terminated normally
sendsize[5891]: time 2810.064: child 6379 terminated normally
sendsize[5891]: time 3041.239: child 6151 terminated normally
sendsize[5891]: time 3617.385: child 6320 terminated normally
[
On Thu, May 19, 2005 at 11:12:09AM -0400, Jon LaBadie wrote:
> On Thu, May 19, 2005 at 10:47:26AM -0400, Guy Dallaire wrote:
> > Here is what I have in my amanda log this morning:
> >
> > FAILURE AND STRANGE DUMP SUMMARY:
> > planner: ERROR Estimate timeout from sol
> > sol /data2 lev 0 FAILED [disk /data2, all estimate timed out]
> > sol /data1 lev 0 FAILED [disk /data1, all estimate timed out]
> > sol /disk1 lev 0 FAILED [disk /disk1, all estimate timed out]
> > sol / lev 0 FAILED [disk /, all estimate timed out]
> >
> > What might be wrong here ? The first time I ran amanda, it backed up
> > this server without a problem. The timeout parameter is at the
> > standard 300 secs. I bumped it up to 600 seconds for the next run but
> > I'm worried.
> >
> > I did not change anything to the config, except maybe yesterday I did
> > an "amdamin force sol /" because I decided to start using gnu-tar
> > instead of ufsdump on all my root file systems.
>
> raise them way up, say 6000 sec to just to see if it simply is slow.
>
> BTW in general it is best to introduce a lot of things a few DLE at
> a time. This avoids the problem of massive level 0's all in one dump.
> Spread them out like amanda will eventually. Add a couple from sol,
> a couple from mercury, one or two from venus, ... Then tomorrow
> a few more.
>
> --
> Jon H. LaBadie jon AT jgcomp DOT com
> JG Computing
> 4455 Province Line Road (609) 252-0159
> Princeton, NJ 08540-4322 (609) 683-7220 (fax)
---
Brian R Cuttler brian.cuttler AT wadsworth DOT org
Computer Systems Support (v) 518 486-1697
Wadsworth Center (f) 518 473-6384
NYS Department of Health Help Desk 518 473-0773
|