This morning my summary included this:
glastcolor /db lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor /usr/local lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor /home lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor /boot lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor / lev 0 FAILED [hmm, disk was stranded on waitq]
razzle /glast/03 lev 0 FAILED [disk /glast/03, all estimate timed out]
razzle /glast/02 lev 0 FAILED [disk /glast/02, all estimate timed out]
razzle /glast/01 lev 0 FAILED [disk /glast/01, all estimate timed out]
razzle /glast/00 lev 0 FAILED [disk /glast/00, all estimate timed out]
razzle /disk6 lev 0 FAILED [disk /disk6, all estimate timed out]
razzle /disk5 lev 0 FAILED [disk /disk5, all estimate timed out]
I saw the same thing once last week. There was a successful dump between
the two failures.
The server is Linux with Amanda version 2.5.0. Glastcolor is Linux with
version 2.4.4. Razzle is ancient Solaris 7 with 2.4.5. Of course, I
haven't changed anything related to Amanda for several weeks.
I looked through sendsize.debug on razzle. All the filesystems which
succeeded sent their estimates before 900 seconds. Those that failed
came after 900 seconds. It looks like a simple problem with etimeout on
the server, doesn't it? Last week I saw that etimeout was set to 900.
That's supposed to be 900 seconds _per filesystem_, which should be
plenty. I thought there might be a misinterpretation or bug, so I
increased etimeout to 1200. It didn't help. The whole sendsize process
took only 1107 seconds. The server log shows "planner: time 26406.488:
getting estimates took 26405.915 secs", but the last sendsize estimate
came in at 349 seconds.
I don't know what to make of glastcolor. Its sendsize.debug looks
entirely normal, finishing after 442 seconds. The last few lines look
like this:
sendsize[14265]: argument list: /bin/tar --create --file /dev/null
--directory /db --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/glastcolor_db_1.new --sparse
--ignore-failed-read --totals .
sendsize[14265]: time 442.278: Total bytes written: 41461760 (40MiB,
737KiB/s)
sendsize[14265]: time 442.279: .....
sendsize[14265]: estimate time for /db level 1: 55.211
sendsize[14265]: estimate size for /db level 1: 40490 KB
sendsize[14265]: time 442.279: waiting for /bin/tar "/db" child
sendsize[14265]: time 442.282: after /bin/tar "/db" wait
sendsize[14265]: time 442.286: done with amname '/db', dirname '/db',
spindle -1
sendsize[14237]: time 442.286: child 14265 terminated normally
sendsize: time 442.287: pid 14237 finish time Sun Oct 14 23:37:30 2007
The server log doesn't show any partial results from glastcolor at all.
|