Amanda-Users

Re: RHEL5 client problems? (2.5.0p2)

2007-09-17 11:18:20
Subject: Re: RHEL5 client problems? (2.5.0p2)
From: JB Segal <jb AT smarterliving DOT com>
To: amanda-users AT amanda DOT org
Date: Mon, 17 Sep 2007 11:15:11 -0400
(I'm betting this never got through due to the corporate domain change
and that I'm not on the list at the address in my .sig. <sigh>
Ah well. Let's try this. There are 2 messages here, top-posted, with my
original problems below my current problems. JB)

Sadly, I changed my RHEL5 clients over to 2.5.2 from zmanda.org and that
didn't fix most of my problems.

I've got only one host reporting 'MISSING' now (and the one that isn't
MISSING _is_ working, but still 2 'FAILED'.

The most frustrating thing is that I have all but one (the remaining
'MISSING' one) passing amcheck.

One of the failed ones has a fine looking debug file, up until it fails:

# cat sendbackup.20070917010550.debug
sendbackup: debug 1 pid 16368 ruid 628 euid 628: start at Mon Sep 17 01:05:50 
2007
sendbackup: version 2.5.2p1
Could not open conf file "/etc/amanda/amanda-client.conf": No such file or 
directory
Could not open conf file "/etc/amanda/smarterliving/amanda-client.conf": No 
such file or directory
sendbackup: debug 1 pid 16368 ruid 628 euid 628: rename at Mon Sep 17 01:05:50 
2007
  sendbackup req: <DUMP /usr  0 1970:1:1:0:0:0 OPTIONS |;auth=BSD;index;>
  parsed request as: program `DUMP'
                     disk `/usr'
                     device `/usr'
                     level 0
                     since 1970:1:1:0:0:0
                     options `|;auth=BSD;index;'
sendbackup: start: FAILEDHost:/usr lev 0
sendbackup: time 0.001: dumping device '/dev/mapper/VolGroup00-LogVol03' with 
'ext3'
sendbackup: time 0.002: started index creator: "/sbin/restore -tvf -
2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
sendbackup: time 0.002: spawning /sbin/dump in pipeline
sendbackup: time 0.002: argument list: dump 0usf 1048576 - 
/dev/mapper/VolGroup00-LogVol03
sendbackup: time 0.003: started backup
sendbackup: time 0.011:  91:  normal(|):   DUMP: Date of this level 0 dump: Mon 
Sep 17 01:05:50 2007
sendbackup: time 0.012:  91:  normal(|):   DUMP: Dumping 
/dev/mapper/VolGroup00-LogVol03 (/usr) to standard output
sendbackup: time 6.096:  91:  normal(|):   DUMP: Label: none
sendbackup: time 6.097:  91:  normal(|):   DUMP: Writing 10 Kilobyte records
sendbackup: time 6.097:  91:  normal(|):   DUMP: mapping (Pass I) [regular 
files]
sendbackup: time 6.655:  91:  normal(|):   DUMP: mapping (Pass II) [directories]
sendbackup: time 6.659:  91:  normal(|):   DUMP: estimated 3419329 blocks.
sendbackup: time 6.659:  91:  normal(|):   DUMP: Volume 1 started with block 1 
at: Mon Sep 17 01:05:57 2007 sendbackup: time 90.017: index tee cannot write 
[Broken pipe]
sendbackup: time 90.017: pid 16370 finish time Mon Sep 17 01:07:20 2007

while the remaining 'MISSING' host looks pretty similar as it fails
amcheck:

# cat amandad.20070917105118.debug
amandad: debug 1 pid 31792 ruid 628 euid 628: start at Mon Sep 17 10:51:18 2007
Could not open conf file "/etc/amanda/amanda-client.conf": No such file or 
directory
amandad: time 0.000: security_getdriver(name=BSD) returns 0x45abe0
amandad: version 2.5.2p1
amandad: time 0.000: build: VERSION="Amanda-2.5.2p1"
amandad: time 0.000:        BUILT_DATE="Wed Jun 6 21:18:37 PDT 2007"
amandad: time 0.000:        BUILT_MACH="Linux rhel5rc-build 2.6.18-8.el5 #1 SMP 
Fri Jan 26 14:15:21 EST 2007 i686 athlon i386 GNU/Linux"
amandad: time 0.000:        CC="gcc"
amandad: time 0.000:        CONFIGURE_COMMAND="'./configure'
'--build=i386-redhat-linux' '--prefix=/usr' '--bindir=/usr/bin'
'--sbindir=/usr/sbin' '--libexecdir=/usr/lib/amanda'
'--datadir=/usr/share' '--sysconfdir=/etc'
'--sharedstatedir=/var/lib/amanda' '--localstatedir=/var/lib/amanda'
'--libdir=/usr/lib' '--includedir=/usr/include' '--infodir=/usr/info'
'--mandir=/usr/share/man' '--with-gnutar=/bin/tar'
'--with-gnutar-listdir=/var/lib/amanda/gnutar-lists'
'--with-dumperdir=/usr/lib/amanda' '--with-index-server=localhost'
'--with-tape-server=localhost' '--with-user=amandabackup'
'--with-group=disk' '--with-owner=paddy' '--with-fqdn'
'--with-bsd-security' '--with-bsdtcp-security' '--with-bsdudp-security'
'--with-ssh-security' '--with-assertions'"
amandad: time 0.000: paths: bindir="/usr/bin" sbindir="/usr/sbin"
amandad: time 0.000:        libexecdir="/usr/lib/amanda" mandir="/usr/share/man"
amandad: time 0.000:        AMANDA_TMPDIR="/tmp/amanda" 
AMANDA_DBGDIR="/tmp/amanda"
amandad: time 0.000:        CONFIG_DIR="/etc/amanda" DEV_PREFIX="/dev/"
amandad: time 0.000:        RDEV_PREFIX="/dev/r" DUMP="/sbin/dump"
amandad: time 0.000:        RESTORE="/sbin/restore" VDUMP=UNDEF VRESTORE=UNDEF
amandad: time 0.000:        XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF 
VXRESTORE=UNDEF
amandad: time 0.000:        SAMBA_CLIENT="/usr/bin/smbclient" GNUTAR="/bin/tar"
amandad: time 0.000:        COMPRESS_PATH="/bin/gzip" 
UNCOMPRESS_PATH="/bin/gzip"
amandad: time 0.000:        LPRCMD="/usr/bin/lpr" MAILER="/usr/bin/Mail"
amandad: time 0.000:    listed_incr_dir="/var/lib/amanda/gnutar-lists"
amandad: time 0.000: defs:  DEFAULT_SERVER="localhost" 
DEFAULT_CONFIG="DailySet1"
amandad: time 0.000:        DEFAULT_TAPE_SERVER="localhost" HAVE_MMAP 
NEED_STRSTR
amandad: time 0.000:        HAVE_SYSVSHM LOCKING=POSIX_FCNTL SETPGRP_VOID 
ASSERTIONS
amandad: time 0.000:        DEBUG_CODE AMANDA_DEBUG_DAYS=4 BSD_SECURITY 
RSH_SECURITY
amandad: time 0.000:        USE_AMANDAHOSTS CLIENT_LOGIN="amandabackup" 
FORCE_USERID
amandad: time 0.000:        HAVE_GZIP COMPRESS_SUFFIX=".gz" 
COMPRESS_FAST_OPT="--fast"
amandad: time 0.000:        COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
amandad: time 0.000: dgram_recv(dgram=0x45cb84, timeout=0, fromaddr=0x46cb70)
amandad: time 0.000: (sockaddr_in *)0x46cb70 = { 2, 847, 69.25.205.8 }
amandad: time 0.000: security_handleinit(handle=0x8053230, driver=0x45abe0 
(BSD))
amandad: time 0.000: accept recv REQ pkt:
<<<<<
SERVICE noop
OPTIONS features=fffffeff9ffeffffff7f;
>>>>>
amandad: time 0.000: creating new service: noop
PTIONS features=fffffeff9ffeffffff7f;

amandad: time 0.001: sending ACK pkt:
<<<<<
>>>>>
amandad: time 0.001: dgram_send_addr(addr=0x8053250, dgram=0x45cb84)
amandad: time 0.001: (sockaddr_in *)0x8053250 = { 2, 847, 69.25.205.8 }
amandad: time 0.001: dgram_send_addr: 0x45cb84->socket = 0
amandad: time 0.002: sending REP pkt:
<<<<<
OPTIONS features=ffffffff9ffeffffffff00;
>>>>>

 . . . which continues until

amandad: time 49.243: timeout
amandad: time 49.243: timeout waiting for ACK for our REP
amandad: time 49.243: security_close(handle=0x8053230, driver=0x45abe0 (BSD))
amandad: time 59.244: pid 31792 finish time Mon Sep 17 10:52:18 2007

Anything here look familiar to anyone?

Thanks!
JB

Quoth JB Segal (jb AT smartertravelmedia DOT com):
> Has anyone else had problems getting the rhel5-distributed version of
> amanda to work?
> 
> My server is a rhel4 box, using the amanda-packaged
> amanda-backup_server-2.5.1p1-1.rhel4.
> 
> I have many clients using RH packaged RHEL4 rpms (which exist as
> amanda-client-2.4.4p3-1 and amanda-2.4.4p3-1) and some older (RH9,
> 2.4.3 boxes) and they all work just fine.
> 
> I have _1_ RHEL5 box that works fine, too, and honestly, I can't tell
> what's different about it, from the 4-5 that don't.
> 
> All clients have the same .amandahosts entries.
> 
> All the rhel5 clients have iptables off at this time.
> 
> They're all behind the same firewall, and none of them have any special
> case entries in said FW. The server's back there, too.
> 
> I managed to get 2 of them amchecking cleanly, eventually, but that
> seems to only be working occasionally now. This morning's automated
> amcheck failed on all 4 of the problematic rhel5 clients. The one I
> manually ran while writing this succeeded for 2, failed for 2.
> 
> On said clients, running amcheck correctly talks to xinetd and xinetd
> correctly launches amandad.
> 
> The debug files for amandad look like:
> amandad: debug 1 pid 22449 ruid 33 euid 33: start at Thu Sep 13 12:11:40 2007
> amandad: version 2.5.0p2
> amandad: build: VERSION="Amanda-2.5.0p2"
>  (plus 4 lines of build info - date/mach/CC/Configure_Command)
> amandad: paths: bindir="/usr/bin" sbindir="/usr/sbin"
>  (plus many more lines of paths)
> amandad: defs:  DEFAULT_SERVER="amandahost" DEFAULT_CONFIG="DailySet1"
>  (plus many more)
> amandad: time 30.057: pid 22449 finish time Thu Sep 13 12:12:10 2007
> 
> But most problematically, of course, is that nothing's getting backed up
> on any of the 4. The run output says:
> 
> FAILURE AND STRANGE DUMP SUMMARY:                                             
>                 
>   problemhost1      /var   lev 0  FAILED [cannot read header: got 0 instead 
> of 32768]               
>   problemhost2    /      lev 0  FAILED [cannot read header: got 0 instead of 
> 32768]               
>   problemhost1      /usr   lev 0  FAILED [cannot read header: got 0 instead 
> of 32768]               
>   problemhost2    /boot  lev 0  FAILED [cannot read header: got 0 instead of 
> 32768]               
>   problemhost1      /usr   lev 0  FAILED [too many dumper retry: "[could not 
> connect DATA stream:   
> can't connect stream to problemhost1.domain.com port -13818: Connection timed 
> out]"]         
>   problemhost1      /usr   lev 0  FAILED [cannot read header: got 0 instead 
> of 32768]               
>   problemhost2    /boot  lev 0  FAILED [too many dumper retry: "[could not 
> connect DATA stream:   
> can't connect stream to problemhost2.domain.com port -30149: Connection timed 
> out]"]       
> 
> . . . and so on for every DLE, and also
> 
>   problemhost3      /boot  RESULTS MISSING                                    
>                       
>   problemhost3      /      RESULTS MISSING                                    
>                       
>   problemhost3      /var   RESULTS MISSING                                    
>                       
>   problemhost3      /usr   RESULTS MISSING                                    
>                       
>   problemhost3      /home  RESULTS MISSING                                    
>                       
>   problemhost4  /boot  RESULTS MISSING                                        
>                   
>   problemhost4  /      RESULTS MISSING                                        
>                   
>   problemhost4  /usr   RESULTS MISSING                                        
>                   
>   problemhost4  /var   RESULTS MISSING                                        
>                   
>   problemhost4  /home  RESULTS MISSING                                        
>                   
>   planner: ERROR Request to problemhost4 failed: timeout waiting for ACK      
>                   
>   planner: ERROR Request to problemhost3 failed: timeout waiting for ACK      
>                       
> 
> The summary gives 'FAILED' for 1 and 2, and 'MISSING' for 3 and 4.
> 
> I swear, all 4 hosts are configured the same, and all are configured
> the same as the working 2.5.0 host (which is REALLY weird) and as the
> working 2.4.x hosts.
> 
> What am I missing? I utterly expect PEBCAK, but I can't see what it is.
> 
> Help? Please??
> Thanks!
> JB
--
JB Segal                 617-886-5575            www.smartertravel.com
Systems/Network Admin.   465 Medford St. Ste 400 www.bookingbuddy.com
Smarter Travel Media LLC Boston, MA 02129        www.tripmania.com

<Prev in Thread] Current Thread [Next in Thread>