Amanda-Users

Re: Coredumps with 2.5.1p3

2007-03-22 19:21:54
Subject: Re: Coredumps with 2.5.1p3
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: "Douglas K. Rand" <rand AT meridian-enviro DOT com>
Date: Thu, 22 Mar 2007 18:40:43 -0400
Douglas,

Could you try the attached patch? it might fix the coredump.

What do you have in the amanda.<timestamps>.debug files? It should show an error message.

Jean-Louis

Douglas K. Rand wrote:
I'm running an Amanda client on a pair of FreeBSD 5.4 and after
upgrading from 2.4.5p1 to 2.5.1p3 I'm getting core dumps of amandad. I
rebuilt the client with debugging and am getting this back trace:

Core was generated by `amandad'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libamandad-2.5.1p3.so...done.
Loaded symbols for /usr/local/lib/libamandad-2.5.1p3.so
Reading symbols from /usr/local/lib/libamanda-2.5.1p3.so...done.
Loaded symbols for /usr/local/lib/libamanda-2.5.1p3.so
Reading symbols from /lib/libm.so.3...done.
Loaded symbols for /lib/libm.so.3
Reading symbols from /lib/libreadline.so.5...done.
Loaded symbols for /lib/libreadline.so.5
Reading symbols from /lib/libncurses.so.5...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libc.so.5...done.
Loaded symbols for /lib/libc.so.5
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0804ac8d in s_ackwait (as=0x805b000, action=A_RECVPKT, pkt=0x280bb000) at 
amandad.c:1068
1068            security_stream_read(dh->netfd, process_writenetfd, dh);
(gdb) print dh
$1 = (struct datafd_handle *) 0x805b048
(gdb) print *dh
$2 = {fd_read = 7, fd_write = 11, ev_read = 0x804e120, ev_write = 0x0, netfd = 
0x0,
  as = 0x805b000}
(gdb) print process_writenetfd
$3 = {void (void *, void *, ssize_t)} 0x804b078 <process_writenetfd>
(gdb) bt
#0  0x0804ac8d in s_ackwait (as=0x805b000, action=A_RECVPKT, pkt=0x280bb000) at 
amandad.c:1068
#1  0x0804a168 in state_machine (as=0x805b000, action=A_RECVPKT, 
pkt=0x280bb000) at amandad.c:696
#2  0x0804ae43 in protocol_recv (cookie=0x805b000, pkt=0x280bb000, status=S_OK) 
at amandad.c:1154
#3  0x28099735 in udp_recvpkt_callback (cookie=0x8059080) at 
security-util.c:1231
#4  0x2808880d in event_wakeup (id=1) at event.c:212
#5  0x28099b86 in udp_netfd_read_callback (cookie=0x280ab000) at 
security-util.c:1406
#6  0x2808902f in event_loop_wait (wait_eh=0x0, dontblock=0) at event.c:489
#7  0x2808883b in event_loop (dontblock=0) at event.c:229
#8  0x080499c0 in main (argc=1, argv=0xbfbfed28) at amandad.c:468

I get the segfault each and every time I try to do a amdump of any
filesystems on these two hosts. amcheck works fine.


Anybody have any ideas before I simply downgrade back to 2.4.5?

These hosts are behind a firewall, but I checked that the
AMANDA_PORTRANGE=50001,50099 and AMANDA_UDPPORTRANGE=801,899 knobs
from /etc/make.conf are correctly finding their way into
config.status: '--with-udpportrange=801,899' '--with-portrange=50001,50099'


diff -u -r --show-c-function --new-file 
--exclude-from=/home/martinea/src.orig/amanda.diff 
--ignore-matching-lines='$Id:' amanda-2.5.1p3/amandad-src/amandad.c 
amanda-2.5.1p3.netfd/amandad-src/amandad.c
--- amanda-2.5.1p3/amandad-src/amandad.c        2007-01-10 11:26:57.000000000 
-0500
+++ amanda-2.5.1p3.netfd/amandad-src/amandad.c  2007-03-22 18:38:18.000000000 
-0400
@@ -1060,6 +1060,7 @@ s_ackwait(
                dh - &as->data[0], security_geterror(as->security_handle)));
            security_stream_close(dh->netfd);
            dh->netfd = NULL;
+           continue;
        }
        /* setup an event for reads from it */
        dh->ev_read = event_register((event_id_t)dh->fd_read, EV_READFD,
<Prev in Thread] Current Thread [Next in Thread>