Amanda-Users

Re: Estimate phase failed for all disks: server not backed up...

2005-05-19 11:51:08
Subject: Re: Estimate phase failed for all disks: server not backed up...
From: Guy Dallaire <clepeterd AT gmail DOT com>
To: amanda-users AT amanda DOT org
Date: Thu, 19 May 2005 11:37:40 -0400
2005/5/19, Jon LaBadie <jon AT jgcomp DOT com>:
> On Thu, May 19, 2005 at 10:47:26AM -0400, Guy Dallaire wrote:
> > Here is what I have in my amanda log this morning:
> >
> > FAILURE AND STRANGE DUMP SUMMARY:
> >   planner: ERROR Estimate timeout from sol
> >   sol        /data2 lev 0 FAILED [disk /data2, all estimate timed out]
> >   sol        /data1 lev 0 FAILED [disk /data1, all estimate timed out]
> >   sol        /disk1 lev 0 FAILED [disk /disk1, all estimate timed out]
> >   sol        / lev 0 FAILED [disk /, all estimate timed out]
> >
> > What might be wrong here ? The first time I ran amanda, it backed up
> > this server without a problem.  The timeout parameter is at the
> > standard 300 secs. I bumped it up to 600 seconds for the next run but
> > I'm worried.
> >
> > I did not change anything to the config, except maybe yesterday I did
> > an "amdamin force sol /" because I decided to start using gnu-tar
> > instead of ufsdump on all my root file systems.
> 
> raise them way up, say 6000 sec to just to see if it simply is slow.
> 
> BTW in general it is best to introduce a lot of things a few DLE at
> a time.  This avoids the problem of massive level 0's all in one dump.
> Spread them out like amanda will eventually.  Add a couple from sol,
> a couple from mercury, one or two from venus, ...  Then tomorrow
> a few more.
> 

I've looked at the .debug files on the client (sol) side, and it looks
like there was no problem doing the estimates for both level. I'm
puzzled. Of cours, I might be looking at the wrong .debug file, but
it's dated approx at the time the backup occured....

Here is the send_size log, lloks like there are NO errors there, am I
right ? So why did this time out ?

sendsize: debug 1 pid 28589 ruid 555 euid 555: start at Wed May 18 19:11:00 2005
sendsize: version 2.4.5
sendsize[28591]: time 0.014: calculating for amname '/', dirname '/', spindle -1
sendsize[28591]: time 0.014: getting size via gnutar for / level 0
sendsize[28591]: Can't open exclude file '//.amanda.excludes': No such
file or directory
sendsize[28591]: time 0.016: spawning /usr/local/libexec/runtar in pipeline
sendsize[28591]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory / --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol__0.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._.20050518191100.exclude .
sendsize[28589]: time 0.031: waiting for any estimate child: 1 running
sendsize[28591]: time 33.649: /usr/local/bin/tar: ./.netscape/key3.db:
Warning: Cannot seek to 0: Bad file number
sendsize[28591]: time 33.650: /usr/local/bin/tar:
./.netscape/secmodule.db: Warning: Cannot seek to 0: Bad file number
sendsize[28591]: time 61.584: /usr/local/bin/tar: ./var/adm/lastlog:
Warning: Cannot seek to 0: Bad file number
sendsize[28591]: time 68.703: /usr/local/bin/tar:
./var/tmp/.oracle/s#18161.1: socket ignored
sendsize[28591]: time 68.704: /usr/local/bin/tar:
./var/tmp/.oracle/s#20906.1: socket ignored
sendsize[28591]: time 68.705: /usr/local/bin/tar:
./var/tmp/.oracle/s#651.1: socket ignored
sendsize[28591]: time 68.705: /usr/local/bin/tar:
./var/tmp/.oracle/s#696.1: socket ignored
sendsize[28591]: time 68.706: /usr/local/bin/tar:
./var/tmp/.oracle/s#883.1: socket ignored
sendsize[28591]: time 68.707: /usr/local/bin/tar:
./var/tmp/.oracle/sEXTPROC0: socket ignored
sendsize[28591]: time 68.757: Total bytes written: 1888778240 (1.8GiB, 27MiB/s)
sendsize[28591]: time 68.758: .....
sendsize[28591]: estimate time for / level 0: 68.741
sendsize[28591]: estimate size for / level 0: 1844510 KB
sendsize[28591]: time 68.758: waiting for /usr/local/bin/tar "/" child
sendsize[28591]: time 68.761: after /usr/local/bin/tar "/" wait
sendsize[28591]: time 68.789: done with amname '/', dirname '/', spindle -1
sendsize[28589]: time 68.790: child 28591 terminated normally
sendsize[28593]: time 68.791: calculating for amname '/disk1', dirname
'/disk1', spindle -1
sendsize[28593]: time 68.791: getting size via gnutar for /disk1 level 0
sendsize[28589]: time 68.793: waiting for any estimate child: 1 running
sendsize[28593]: time 68.850: spawning /usr/local/libexec/runtar in pipeline
sendsize[28593]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory /disk1 --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol_disk1_0.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._disk1.20050518191209.exclude .
sendsize[28593]: time 75.554: Total bytes written: 3471001600 (3.3GiB, 498MiB/s)
sendsize[28593]: time 75.555: .....
sendsize[28593]: estimate time for /disk1 level 0: 6.705
sendsize[28593]: estimate size for /disk1 level 0: 3389650 KB
sendsize[28593]: time 75.555: waiting for /usr/local/bin/tar "/disk1" child
sendsize[28593]: time 75.557: after /usr/local/bin/tar "/disk1" wait
sendsize[28593]: time 75.557: getting size via gnutar for /disk1 level 1
sendsize[28593]: time 75.612: spawning /usr/local/libexec/runtar in pipeline
sendsize[28593]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory /disk1 --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol_disk1_1.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._disk1.20050518191216.exclude .
sendsize[28593]: time 76.538: Total bytes written: 1986457600 (1.9GiB, 2.2GiB/s)
sendsize[28593]: time 76.538: .....
sendsize[28593]: estimate time for /disk1 level 1: 0.926
sendsize[28593]: estimate size for /disk1 level 1: 1939900 KB
sendsize[28593]: time 76.539: waiting for /usr/local/bin/tar "/disk1" child
sendsize[28593]: time 76.540: after /usr/local/bin/tar "/disk1" wait
sendsize[28593]: time 76.540: getting size via gnutar for /disk1 level 2
sendsize[28593]: time 76.564: spawning /usr/local/libexec/runtar in pipeline
sendsize[28593]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory /disk1 --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol_disk1_2.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._disk1.20050518191217.exclude .
sendsize[28593]: time 77.628: Total bytes written: 1986457600 (1.9GiB, 1.8GiB/s)
sendsize[28593]: time 77.628: .....
sendsize[28593]: estimate time for /disk1 level 2: 1.064
sendsize[28593]: estimate size for /disk1 level 2: 1939900 KB
sendsize[28593]: time 77.628: waiting for /usr/local/bin/tar "/disk1" child
sendsize[28593]: time 77.630: after /usr/local/bin/tar "/disk1" wait
sendsize[28593]: time 77.630: done with amname '/disk1', dirname
'/disk1', spindle -1
sendsize[28589]: time 77.632: child 28593 terminated normally
sendsize[28608]: time 77.633: calculating for amname '/data1', dirname
'/data1', spindle -1
sendsize[28608]: time 77.633: getting size via gnutar for /data1 level 0
sendsize[28589]: time 77.635: waiting for any estimate child: 1 running
sendsize[28608]: time 77.671: spawning /usr/local/libexec/runtar in pipeline
sendsize[28608]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory /data1 --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol_data1_0.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._data1.20050518191218.exclude .
sendsize[28608]: time 77.688: Total bytes written: 10240 (10KiB, 6.1MiB/s)
sendsize[28608]: time 77.689: .....
sendsize[28608]: estimate time for /data1 level 0: 0.018
sendsize[28608]: estimate size for /data1 level 0: 10 KB
sendsize[28608]: time 77.689: waiting for /usr/local/bin/tar "/data1" child
sendsize[28608]: time 77.689: after /usr/local/bin/tar "/data1" wait
sendsize[28608]: time 77.689: getting size via gnutar for /data1 level 1
sendsize[28608]: time 77.699: spawning /usr/local/libexec/runtar in pipeline
sendsize[28608]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory /data1 --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol_data1_1.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._data1.20050518191218000.exclude .
sendsize[28608]: time 77.714: Total bytes written: 10240 (10KiB, 5.5MiB/s)
sendsize[28608]: time 77.715: .....
sendsize[28608]: estimate time for /data1 level 1: 0.016
sendsize[28608]: estimate size for /data1 level 1: 10 KB
sendsize[28608]: time 77.715: waiting for /usr/local/bin/tar "/data1" child
sendsize[28608]: time 77.715: after /usr/local/bin/tar "/data1" wait
sendsize[28608]: time 77.715: done with amname '/data1', dirname
'/data1', spindle -1
sendsize[28589]: time 77.717: child 28608 terminated normally
sendsize[28611]: time 77.717: calculating for amname '/data2', dirname
'/data2', spindle -1
sendsize[28611]: time 77.717: getting size via gnutar for /data2 level 0
sendsize[28589]: time 77.719: waiting for any estimate child: 1 running
sendsize[28611]: time 77.741: spawning /usr/local/libexec/runtar in pipeline
sendsize[28611]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory /data2 --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol_data2_0.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._data2.20050518191218.exclude .
sendsize[28611]: time 77.757: Total bytes written: 10240 (10KiB, 4.9MiB/s)
sendsize[28611]: time 77.758: .....
sendsize[28611]: estimate time for /data2 level 0: 0.017
sendsize[28611]: estimate size for /data2 level 0: 10 KB
sendsize[28611]: time 77.758: waiting for /usr/local/bin/tar "/data2" child
sendsize[28611]: time 77.758: after /usr/local/bin/tar "/data2" wait
sendsize[28611]: time 77.758: getting size via gnutar for /data2 level 1
sendsize[28611]: time 77.765: spawning /usr/local/libexec/runtar in pipeline
sendsize[28611]: argument list: /usr/local/bin/tar --create --file
/dev/null --directory /data2 --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/sol_data2_1.new --sparse
--ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._data2.20050518191218000.exclude .
sendsize[28611]: time 77.781: Total bytes written: 10240 (10KiB, 6.3MiB/s)
sendsize[28611]: time 77.782: .....
sendsize[28611]: estimate time for /data2 level 1: 0.016
sendsize[28611]: estimate size for /data2 level 1: 10 KB
sendsize[28611]: time 77.782: waiting for /usr/local/bin/tar "/data2" child
sendsize[28611]: time 77.782: after /usr/local/bin/tar "/data2" wait
sendsize[28611]: time 77.782: done with amname '/data2', dirname
'/data2', spindle -1
sendsize[28589]: time 77.783: child 28611 terminated normally
sendsize: time 77.783: pid 28589 finish time Wed May 18 19:12:18 2005