"disk was stranded on waitq"/sendsize timed out waiting for REP data
2007-03-01 05:43:36
We just started to get a serious problem with our amdump execution
(Amanda 2.5.0p2). As usual, we don't thing we have changed anything at
all after the last successful dump
Symptoms:
1. "amstatus" says
fileserv:/scanner 0 planner: [hmm, disk was
stranded on waitq]
2. "sendsize" on the host in question hangs, and I mean really hangs
- not even 'kill -9' will stop it.
3. The amandad.<id>.debug on this host ("fileserv") says:
amandad: time 14027.090: sending ACK pkt:
<<<<<
>>>>>
amandad: time 21600.297: /usr/freeware/libexec/sendsize timed out
waiting for REP data
amandad: time 21600.309: sending NAK pkt:
<<<<<
ERROR timeout on reply pipe
>>>>>
amandad: time 35627.467: /usr/freeware/libexec/sendsize timed out
waiting for REP data
amandad: time 35627.467: sending NAK pkt:
<<<<<
ERROR timeout on reply pipe
>>>>>
amandad: time 35650.476: pid 11670783 finish time Thu Mar 1
07:54:12 2007
This happens for all disks on one particular host. Other DLEs appear to
be OK, but nothing is actually dumped, since amdump will give up the
entire operation due to these problems (I think.)
Also, we actually run amdump with two different configs (the usual tape
backup and an "incremental only" with output to harddisk) on the same
disks every night (but not simultaneously, of course), and we see this
behaviour for both.
HELP!
- Toralf
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- "disk was stranded on waitq"/sendsize timed out waiting for REP data,
Toralf Lund <=
|
|
|