Amanda-Users

Two kinds of timeouts

2007-11-02 19:50:17
Subject: Two kinds of timeouts
From: Patrick Nolan <Patrick.Nolan AT stanford DOT edu>
To: amanda-users AT amanda DOT org
Date: Fri, 02 Nov 2007 16:46:11 -0700
This morning my summary included this:

glastcolor /db lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor /usr/local lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor /home lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor /boot lev 0 FAILED [hmm, disk was stranded on waitq]
glastcolor / lev 0 FAILED [hmm, disk was stranded on waitq]
razzle /glast/03 lev 0 FAILED [disk /glast/03, all estimate timed out]
razzle /glast/02 lev 0 FAILED [disk /glast/02, all estimate timed out]
razzle /glast/01 lev 0 FAILED [disk /glast/01, all estimate timed out]
razzle /glast/00 lev 0 FAILED [disk /glast/00, all estimate timed out]
razzle /disk6 lev 0 FAILED [disk /disk6, all estimate timed out]
razzle /disk5 lev 0 FAILED [disk /disk5, all estimate timed out]

I saw the same thing once last week. There was a successful dump between the two failures.

The server is Linux with Amanda version 2.5.0. Glastcolor is Linux with version 2.4.4. Razzle is ancient Solaris 7 with 2.4.5. Of course, I haven't changed anything related to Amanda for several weeks.

I looked through sendsize.debug on razzle. All the filesystems which
succeeded sent their estimates before 900 seconds.  Those that failed
came after 900 seconds. It looks like a simple problem with etimeout on the server, doesn't it? Last week I saw that etimeout was set to 900.
That's supposed to be 900 seconds _per filesystem_, which should be
plenty. I thought there might be a misinterpretation or bug, so I
increased etimeout to 1200. It didn't help. The whole sendsize process
took only 1107 seconds. The server log shows "planner: time 26406.488:
getting estimates took 26405.915 secs", but the last sendsize estimate
came in at 349 seconds.

I don't know what to make of glastcolor. Its sendsize.debug looks
entirely normal, finishing after 442 seconds. The last few lines look
like this:
sendsize[14265]: argument list: /bin/tar --create --file /dev/null --directory /db --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/glastcolor_db_1.new --sparse --ignore-failed-read --totals . sendsize[14265]: time 442.278: Total bytes written: 41461760 (40MiB, 737KiB/s)
sendsize[14265]: time 442.279: .....
sendsize[14265]: estimate time for /db level 1: 55.211
sendsize[14265]: estimate size for /db level 1: 40490 KB
sendsize[14265]: time 442.279: waiting for /bin/tar "/db" child
sendsize[14265]: time 442.282: after /bin/tar "/db" wait
sendsize[14265]: time 442.286: done with amname '/db', dirname '/db', spindle -1
sendsize[14237]: time 442.286: child 14265 terminated normally
sendsize: time 442.287: pid 14237 finish time Sun Oct 14 23:37:30 2007

The server log doesn't show any partial results from glastcolor at all.


<Prev in Thread] Current Thread [Next in Thread>
  • Two kinds of timeouts, Patrick Nolan <=