Amanda-Users

odd dump timeout symptoms

2005-08-17 10:20:44
Subject: odd dump timeout symptoms
From: Jamie Wilkinson <jaq AT spacepants DOT org>
To: amanda-users AT amanda DOT org
Date: Wed, 17 Aug 2005 12:35:46 +1000
I have a very large DLE, approaching 100GB, on my fileserver.  The backup
server is running 2.4.5, and the fileserver is running 2.4.5b1.

The dump on this DLE is returning the following error:

  bulkhead.b /data/home lev 0 FAILED [data read: Connection reset by peer]

in the summary, which looks like this in the sendbackup log on this client:

sendbackup: time 10.050: spawning /usr/lib/amanda/runtar in pipeline
sendbackup: argument list: gtar --create --file - --directory /home 
--one-file-system --listed-incremental 
/var/lib/amanda/gnutar-lists/bulkhead.backup_data_home_0.new --sparse 
--ignore-failed-read --totals --exclude-from 
/var/log/amanda/sendbackup._data_home.20050817013729.exclude .
sendbackup-gnutar: time 10.051: /usr/lib/amanda/runtar: pid 6101
sendbackup: time 10.055: started index creator: "/bin/tar -tf - 2>/dev/null | 
sed -e 's/^\.//'"
sendbackup: time 22182.305: index tee cannot write [Broken pipe]
sendbackup: time 22182.305: 126: strange(?):
sendbackup: time 22182.326: pid 6099 finish time Wed Aug 17 07:47:01 2005
sendbackup: time 22182.327: 126: strange(?): gzip: stdout: Connection timed out
sendbackup: time 22182.328: 126: strange(?): sendbackup: index tee cannot write
[Broken pipe]
sendbackup: time 22182.385: error [compress returned 1, /bin/tar got signal 13]
sendbackup: time 22182.385: pid 6096 finish time Wed Aug 17 07:47:01 2005


and in the server's amdump.1, relevant lines are:

planner: time 0.082: setting up estimates for bulkhead.backup:/data/home
bulkhead.backup:/data/home overdue 21 days for level 0
setup_estimate: bulkhead.backup:/data/home: command 0, options: none    
last_level 1 next_level0 -21 level_days 1    getting estimates 0 (-2) 1 (-2) 2 
(-2)
planner time 2.638: got result for host bulkhead.backup disk /data/home: 0 -> 
88513630K, 1 -> 2750443K, 2 -> 3325880K
  0: bulkhead.backup /data/home
pondering bulkhead.backup:/data/home... next_level0 -21 last_level 1 (due for 
level 0) (picking inclevel for degraded mode)   pick: size 2750443 level 1 days 
1 (thresh 20480K, 1 days)
  bulkhead.backup /data/home pri 23 lev 0 size 57655638
DUMP bulkhead.backup fffffeff9ffe0f /data/home 20050817 23 0 1970:1:1:0:0:0 
57655638 32360 1 2005:7:19:15:50:32 850574 9129
driver: send-cmd time 2237.343 to dumper3: FILE-DUMP 03-00004 
/data/amanda/anchor/20050817010002/bulkhead.backup._data_home.0 bulkhead.backup 
fffffeff9ffe0f /data/home NODEVICE 0 1970:1:1:0:0:0 1048576 GNUTAR 57657440 
|;bsd-auth;compress-best;index;exclude-list=.amandaexclude;exclude-optional;


Now my dtimeout was set to 7200 at the start of this, and over the last few
nights have set it to 14400, 21600, and now back to 7200.  These have had no
effect on the actual timeout; the times reported at the 'index tee' failure
vary between 6000, 22000, 37000 regardless of the dtimeout.  I've now set it
back to 7200, which had been working for this DLE about a month ago (you can
see it's been 21 days since the DLE was correctly backed up).

So I'm stumped as to what to try next, can anyone think of anything I might
have missed, or hand out a clue?

<Prev in Thread] Current Thread [Next in Thread>