Amanda-Users

FAILED backups on different hosts each night

2006-08-27 12:10:26
Subject: FAILED backups on different hosts each night
From: "Stephen Carter" <Stephen AT retnet.co DOT uk>
To: <amanda-users AT amanda DOT org>
Date: Sun, 27 Aug 2006 16:56:03 +0100
I have 2 physical boxes I'm backing up, one called srv1 and the other called 
srv2.

srv1 is always backed up correctly, which also has the tape device and runs the 
amanda backups.

srv2 is a SLES 10 server running 3 virtual SLES 10 XEN guests within it, but 
I'm treating them as separate physical boxes for the purposes of amanda.
 
On different nights, different XEN guests fail (including the host, srv2) with 
a "could not connect" error in the amanda report.

amstatus says 'wait for dumping driver: (aborted:could not connect to data 
port: Connection timed out)

amdump.1 reports all estimates worked, with a "FAILED QUEUE: empty" and the 
DONE QUEUE: includes all DLE's listed in the disklist.

amdump.1 then reports the dumper process, 2 of which work with my other 4 DLE's 
failing with:
dumper: stream_client: connect to 192.168.0.9:12359 failed: Connection timed out

I allow all traffic between srv1 (my backup server) and all clients, and 
thinking it may have been a throughput problem I reduced parallel dumps to 1 
which hasn't helped.

A copy of the latest amstatus & a section from my amdump.1 files are below.  
Any help would be greatly appreciated.


AMSTATUS OUTPUT:
srv1:/var/lib/amanda/DailySet1 # amstatus DailySet1
Using /var/lib/amanda/DailySet1/amdump.1 from Fri Aug 25 01:00:02 BST 2006

srv1.retnet.co.uk:md0         3   352152k finished (1:17:18)
mailscan.retnet.co.uk:hda2       0  1062300k wait for dumping driver: 
(aborted:could not connect to data port: Connection timed out)
srv2.retnet.co.uk:/srv/install 0 21497250k wait for dumping driver: 
(aborted:could not connect to data port: Connection timed out)
srv2.retnet.co.uk:md0          0  4242910k wait for dumping driver: 
(aborted:could not connect to data port: Connection timed out)
web-1.retnet.co.uk:hda2      0   699770k finished (1:33:02)
web-2.retnet.co.uk:hda2     0   906355k wait for dumping driver: (aborted:could 
not connect to data port: Connection timed out)

SUMMARY          part      real  estimated
                           size       size
partition       :   6
estimated       :   6             28769687k
flush           :   0         0k
failed          :   0                    0k           (  0.00%)
wait for dumping:   4             27708815k           ( 96.31%)
dumping to tape :   0                    0k           (  0.00%)
dumping         :   0         0k         0k (  0.00%) (  0.00%)
dumped          :   2   1051922k   1060872k ( 99.16%) (  3.66%)
wait for writing:   0         0k         0k (  0.00%) (  0.00%)
wait to flush   :   0         0k         0k (100.00%) (  0.00%)
writing to tape :   0         0k         0k (  0.00%) (  0.00%)
failed to tape  :   0         0k         0k (  0.00%) (  0.00%)
taped           :   2   1051922k   1060872k ( 99.16%) (  3.66%)
  tape 1        :   2   1051922k   1060872k (  2.94%) DailySet1-5
1 dumper idle   : not-idle
taper idle
network free kps:      2600
holding space   :  33792000k (100.00%)
 dumper0 busy   :  0:40:08  ( 95.25%)
   taper busy   :  0:06:47  ( 16.10%)
 0 dumpers busy :  0:00:00  (  0.00%)
 1 dumper busy  :  0:42:08  (100.00%)            not-idle:  0:28:40  ( 68.07%)
                                               no-dumpers:  0:13:27  ( 31.93%)
srv1:/var/lib/amanda/DailySet1 #




AMDUMP.1 PARTIAL OUTPUT:
driver: adding holding disk 0 dir /mnt/dumps size 33792000
reserving 33792000 out of 33792000 for degraded-mode dumps
driver: flush size 0
driver: start time 812.693 inparallel 1 bandwidth 2600 diskspace 33792000 dir 
OBSOLETE datestamp 20060825 driver: drain-ends tapeq FIRST big-dumpers ttt
driver: result time 812.693 from taper: TAPER-OK
driver: send-cmd time 812.703 to dumper0: FILE-DUMP 00-00001 
/mnt/dumps/20060825/srv1.retnet.co.uk.md0.3 srv1.retnet.co.uk fffffeff9ffe0f 
md0 NODEVICE 3 2006:8:22:0:36:52 1073741824 GNUTAR 356544 
|;bsd-auth;compress-best;index;exclude-list=/usr/lib/amanda/exclude.gtar;
driver: state time 812.703 free kps: -2090 space: 33435456 taper: idle 
idle-dumpers: 0 qlen tapeq: 0 runq: 5 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 812.703 if : free -3890 if ETH0: free 800 if 
LOCAL: free 1000
driver: hdisk-state time 812.703 hdisk 0: free 33435456 dumpers 1
dumper: stream_client: connected to 192.168.0.1.51236
dumper: stream_client: our side is 0.0.0.0.51239
dumper: stream_client: connected to 192.168.0.1.51237
dumper: stream_client: our side is 0.0.0.0.51240
dumper: stream_client: connected to 192.168.0.1.51238
dumper: stream_client: our side is 0.0.0.0.51241
driver: result time 901.369 from dumper0: DONE 00-00001 441620 352152 89 [sec 
88.636 kb 352152 kps 3973.0 orig-kb 441620]
driver: finished-cmd time 901.387 dumper0 dumped srv1.retnet.co.uk:md0
driver: send-cmd time 901.387 to taper: FILE-WRITE 00-00002 
/mnt/dumps/20060825/srv1.retnet.co.uk.md0.3 srv1.retnet.co.uk fffffeff9ffe0f 
md0 3 20060825
driver: startaflush: FIRST srv1.retnet.co.uk md0 352185 35840000
driver: send-cmd time 901.387 to dumper0: FILE-DUMP 01-00003 
/mnt/dumps/20060825/web-1.retnet.co.uk.hda2.0 web-1.retnet.co.uk fffffeff9ffe7f 
hda2 NODEVICE 0 1970:1:1:0:0:0 1073741824 GNUTAR 704480 
|;bsd-auth;compress-best;index;exclude-list=/usr/lib/amanda/exclude.gtar;
driver: state time 901.388 free kps: 1774 space: 32735335 taper: writing 
idle-dumpers: 0 qlen tapeq: 0 runq: 4 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 901.388 if : free -26 if ETH0: free 800 if LOCAL: 
free 1000
driver: hdisk-state time 901.388 hdisk 0: free 32735335 dumpers 1
dumper: stream_client: connected to 192.168.0.6.35836
dumper: stream_client: our side is 0.0.0.0.51285
dumper: stream_client: connected to 192.168.0.6.60410
dumper: stream_client: our side is 0.0.0.0.51286
dumper: stream_client: connected to 192.168.0.6.58452
dumper: stream_client: our side is 0.0.0.0.51287
taper: reader-side: got label DailySet1-5 filenum 1
driver: result time 1036.302 from taper: DONE 00-00002 DailySet1-5 1 [sec 
134.914 kb 352153 kps 2610.2 {wr: writers 11006 rdwait 0.000 wrwait 132.407 
filemark 2.332}]
driver: finished-cmd time 1036.622 taper wrote srv1.retnet.co.uk:md0
driver: state time 1036.622 free kps: 1774 space: 33087520 taper: idle 
idle-dumpers: 0 qlen tapeq: 0 runq: 4 roomq: 0 wakeup: 86400 driver-idle: 
no-dumpers
driver: interface-state time 1036.622 if : free -26 if ETH0: free 800 if LOCAL: 
free 1000
driver: hdisk-state time 1036.622 hdisk 0: free 33087520 dumpers 1
driver: result time 1708.519 from dumper0: DONE 01-00003 1929800 699770 807 
[sec 807.053 kb 699770 kps 867.1 orig-kb 1929800]
driver: finished-cmd time 1708.527 dumper0 dumped web-1.retnet.co.uk:hda2
driver: send-cmd time 1708.527 to taper: FILE-WRITE 00-00004 
/mnt/dumps/20060825/web-1.retnet.co.uk.hda2.0 web-1.retnet.co.uk fffffeff9ffe7f 
hda2 0 20060825
driver: startaflush: FIRST web-1.retnet.co.uk hda2 699803 35487815
driver: send-cmd time 1708.527 to dumper0: FILE-DUMP 01-00005 
/mnt/dumps/20060825/srv2.retnet.co.uk.md0.0 srv2.retnet.co.uk fffffeff9ffe7f 
md0 NODEVICE 0 1970:1:1:0:0:0 1073741824 GNUTAR 4242976 
|;bsd-auth;compress-best;index;exclude-list=/usr/lib/amanda/exclude.gtar;
driver: state time 1708.528 free kps: -1899 space: 28849221 taper: writing 
idle-dumpers: 0 qlen tapeq: 0 runq: 3 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 1708.528 if : free -3699 if ETH0: free 800 if 
LOCAL: free 1000
driver: hdisk-state time 1708.528 hdisk 0: free 28849221 dumpers 1
dumper: stream_client: connect to 192.168.0.9.12359 failed: Connection timed 
outdriver: result time 1897.780 from dumper0: TRY-AGAIN 01-00005 could not 
connect to data port: Connection timed out
rename_tmp_holding: /mnt/dumps/20060825/srv2.retnet.co.uk.md0.0.tmp: empty file?
unlink_holding_files: open of /mnt/dumps/20060825/srv2.retnet.co.uk.md0.0 
failed: No such file or directory
driver: send-cmd time 1912.782 to dumper0: FILE-DUMP 01-00006 
/mnt/dumps/20060825/srv2.retnet.co.uk.md0.0 srv2.retnet.co.uk fffffeff9ffe7f 
md0 NODEVICE 0 1970:1:1:0:0:0 1073741824 GNUTAR 4242976 
|;bsd-auth;compress-best;index;exclude-list=/usr/lib/amanda/exclude.gtar;
driver: state time 1912.782 free kps: -1899 space: 28849221 taper: writing 
idle-dumpers: 0 qlen tapeq: 0 runq: 3 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 1912.782 if : free -3699 if ETH0: free 800 if 
LOCAL: free 1000
driver: hdisk-state time 1912.782 hdisk 0: free 28849221 dumpers 1
taper: reader-side: got label DailySet1-5 filenum 2
driver: result time 1980.713 from taper: DONE 00-00004 DailySet1-5 2 [sec 
272.185 kb 699771 kps 2570.9 {wr: writers 21869 rdwait 0.000 wrwait 269.520 
filemark 2.366}]
driver: finished-cmd time 1981.371 taper wrote web-1.retnet.co.uk:hda2
driver: state time 1981.371 free kps: -1899 space: 29549024 taper: idle 
idle-dumpers: 0 qlen tapeq: 0 runq: 3 roomq: 0 wakeup: 86400 driver-idle: 
no-dumpers
driver: interface-state time 1981.371 if : free -3699 if ETH0: free 800 if 
LOCAL: free 1000
driver: hdisk-state time 1981.371 hdisk 0: free 29549024 dumpers 1
dumper: stream_client: connect to 192.168.0.9.21652 failed: Connection timed 
outdriver: result time 2101.767 from dumper0: TRY-AGAIN 01-00006 could not 
connect to data port: Connection timed out
rename_tmp_holding: /mnt/dumps/20060825/srv2.retnet.co.uk.md0.0.tmp: empty file?
unlink_holding_files: open of /mnt/dumps/20060825/srv2.retnet.co.uk.md0.0 
failed: No such file or directory
driver: send-cmd time 2116.769 to dumper0: FILE-DUMP 00-00007 
/mnt/dumps/20060825/srv2.retnet.co.uk._srv_install.0 srv2.retnet.co.uk 
fffffeff9ffe7f /srv/install NODEVICE 0 1970:1:1:0:0:0 1073741824 GNUTAR 
21497344 |;bsd-auth;index;exclude-list=;


Stephen Carter
Retrac Networking Limited
www: http://www.retnet.co.uk
Ph: +44 (0)7870 218 693
Fax: +44 (0)870 7060 056
CNA, CNE 6, CNS, CCNA, MCSE 2003



<Prev in Thread] Current Thread [Next in Thread>