Amanda-Users

some filesystems fail on a host

2005-07-06 18:10:59
Subject: some filesystems fail on a host
From: Oscar Ricardo Silva <osilva AT scuff.cc.utexas DOT edu>
To: amanda-users AT amanda DOT org
Date: Wed, 06 Jul 2005 16:53:55 -0500
I have a few hosts where some filesystems fail to be backed up. I thought it might be firewall/iptables issues but the fact that at least one filesystem on the host is successful seems to ruin that idea. Also, on the amanda server one of the hosts having problems is completely trusted on all ports. Server and clients are running amanda 2.4.5 with one client configured with:

CONFIGURE_COMMAND="'./configure' '--prefix=/usr/local/amanda' '--without-server' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-includes=/usr/include' '--with-libraries=/usr/lib' '--with-user=amanda' '--with-group=disk' '--with-config=daily' '--with-gnutar=/bin/tar' '--with-gnutar-listdir=/usr/local/amanda/gnutar-lists' '--with-debugging' '--with-debug-days=10' '--with-index-server=amanda.tn.utexas.edu' '--with-tape-server=amanda.tn.utexas.edu' '--with-rundump' '--with-fqdn' '--with-tcpportrange=32820,32880' '--with-udpportrange=900,960' '--with-dump-honor-nodump' '--with-buffered-dump'"

To look at one host, scuff.cc.utexas.edu (Red Hat Enterprise AS release 3), with filesystems:

/                       /dev/hda2
/home                   /dev/hda8
/usr                    /dev/hda5
/var                    /dev/hda6


scuff.cc.utexas.edu / 1 2030 2030 -- 0:01 1401.2 0:02 838.6 scuff.cc.utexas.edu /home 0FAILED --------------------------------------------------- scuff.cc.utexas.edu /usr 0FAILED --------------------------------------------------- scuff.cc.utexas.edu /var 0FAILED ---------------------------------------------------

Here's a portion of the sendsize:

sendsize[29384]: time 761.392: /bin/tar: ./lib/mysql/mysql.sock: socket ignored
sendsize[29384]: time 761.653: Total bytes written: 480634880 (458MB, ?B/s)
sendsize[29384]: time 761.653: .....
sendsize[29384]: estimate time for /var level 2: 0.319
sendsize[29384]: estimate size for /var level 2: 469370 KB
sendsize[29384]: time 761.653: waiting for /bin/tar "/var" child
sendsize[29384]: time 761.653: after /bin/tar "/var" wait
sendsize[29384]: time 761.653: done with amname '/var', dirname '/var', spindle -1
sendsize[29141]: time 761.654: child 29384 terminated normally
sendsize[29141]: time 761.654: waiting for any estimate child: 1 running
sendsize[29163]: time 812.762: /bin/tar: ./local/amanda/gnutar-lists/scuff.cc.utexas.edu_home_0.new:
 Warning: Cannot stat: No such file or directory
sendsize[29163]: time 819.585: Total bytes written: 4542709760 (4.2GB, 6.0MB/s)
sendsize[29163]: time 819.586: .....
sendsize[29163]: estimate time for /usr level 0: 725.002
sendsize[29163]: estimate size for /usr level 0: 4436240 KB
sendsize[29163]: time 819.586: waiting for /bin/tar "/usr" child
sendsize[29163]: time 819.586: after /bin/tar "/usr" wait
sendsize[29163]: time 819.587: done with amname '/usr', dirname '/usr', spindle -1
sendsize[29141]: time 819.587: child 29163 terminated normally
sendsize: time 819.587: pid 29141 finish time Mon Jul  4 21:13:52 2005

in amanda.conf I have etimeout set to -5400 so the time listed in sendsize is under this timeout.



But looking at one of the amanda debug files: amandad.20050704210012.debug I do see a problem but have no idea where it's coming from.


amandad: time 819.606: sending PREP packet:
----
Amanda 2.4 PREP HANDLE 016-38D50D09 SEQ 1120528857
OPTIONS features=fffffeff9ffe7f;
/ 0 SIZE 318970
/ 1 SIZE 2030
/home 0 SIZE 29349730
/home 1 SIZE 29349740
/var 0 SIZE 469370
/var 1 SIZE 469370
/var 2 SIZE 469370
/usr 0 SIZE 4436240
----

amandad: time 819.606: sending REP packet:
----
Amanda 2.4 REP HANDLE 016-38D50D09 SEQ 1120528857
OPTIONS features=fffffeff9ffe7f;
/ 0 SIZE 318970
/ 1 SIZE 2030
/home 0 SIZE 29349730
/home 1 SIZE 29349740
/var 0 SIZE 469370
/var 1 SIZE 469370
/var 2 SIZE 469370
/usr 0 SIZE 4436240
----

amandad: time 829.600: dgram_recv: timeout after 10 seconds
amandad: time 829.600: waiting for ack: timeout, retrying
amandad: time 839.600: dgram_recv: timeout after 10 seconds
amandad: time 839.600: waiting for ack: timeout, retrying
amandad: time 849.600: dgram_recv: timeout after 10 seconds
amandad: time 849.600: waiting for ack: timeout, retrying
amandad: time 859.600: dgram_recv: timeout after 10 seconds
amandad: time 859.600: waiting for ack: timeout, retrying
amandad: time 869.600: dgram_recv: timeout after 10 seconds
amandad: time 869.600: waiting for ack: timeout, giving up!
amandad: time 869.600: pid 29140 finish time Mon Jul  4 21:14:42 2005


Am I wrong and could it be some restrictions between client and server?


Oscar






<Prev in Thread] Current Thread [Next in Thread>