Bacula-users

Re: [Bacula-users] [Bacula-devel] Storage Daemon crash backtrace

2010-06-24 01:37:31
Subject: Re: [Bacula-users] [Bacula-devel] Storage Daemon crash backtrace
From: Robert LeBlanc <robert AT leblancnet DOT us>
To: Kern Sibbald <kern AT sibbald DOT com>
Date: Wed, 23 Jun 2010 23:08:06 -0600

On Wed, Jun 23, 2010 at 1:59 PM, Kern Sibbald <kern AT sibbald DOT com> wrote:
On Wednesday 23 June 2010 21:24:20 Robert LeBlanc wrote:
> On Wed, Jun 23, 2010 at 1:17 PM, Kern Sibbald <kern AT sibbald DOT com> wrote:
> > Yes, as Martin says, SIGUSR2 is something that should be ignored.  We use
> > it internally to signal between threads, and when you are running the
> > debugger on Bacula, you need to tell the debugger to ignore it -- as
> > Martin indicates, or most often, when I am manually debugging, I start it
> > with "run -s -f ...". For more information, see the Kaboom chapter of the
> > "Problems" manual.
> >
> > Of course, if Bacula sent you the traceback (or put it in your working
> > directory), you should open a bug report and post it there, and we will
> > look at it.
>
> I had to run gdb manually (the e-mail report kept coming back empty)
> and followed the notes in the manual. I did 'run -s -f ...' as the
> manual said. I'll ignore SIGUSR2 and get it to crash again.

Well, the -s should cause gdb to ignore the signal and just pass it to Bacula,
which in turn ignores it.

If you are running Bacula 5.0.2, in 99% of the cases, you will find the
traceback and the bactrace files in your working directory when Bacula is not
run under the debugger and it crashes.

If you are running a 3.0.x or older version, you will need a support contract
if you want us to look at the problem ...

Kern

Ok, ignoring the SIGUSR2, I got a SIGPIPE, here is the backtrace when gdb paused. We are running 5.0.2 and I've looked in the working directory for traceback files. There are some, however they only have:

ptrace: No such process.
/var/lib/bacula/26433: No such file or directory.

I thought by recompiling the Debian package with debug symbols that this would be resolved, but this is from the last traceback file written.

I hope this is helpful. 

root@lsbacsd0:/usr/sbin# gdb ./bacula-sd 
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /usr/sbin/bacula-sd...done.
(gdb) handle SIGUSR2 nostop noprint pass
Signal        Stop Print Pass to program Description
SIGUSR2       No No Yes User defined signal 2
(gdb) run -s -f -c /etc/bacula/bacula-sd.conf
Starting program: /usr/sbin/bacula-sd -s -f -c /etc/bacula/bacula-sd.conf
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff509c710 (LWP 28087)]
[New Thread 0x7ffff489b710 (LWP 28088)]
[New Thread 0x7ffff409a710 (LWP 28098)]
[New Thread 0x7ffff3899710 (LWP 28099)]
[Thread 0x7ffff509c710 (LWP 28087) exited]
[New Thread 0x7ffff509c710 (LWP 28187)]
[New Thread 0x7ffff2e8c710 (LWP 28188)]
[New Thread 0x7ffff268b710 (LWP 28189)]
[New Thread 0x7ffff1e8a710 (LWP 28190)]
[Thread 0x7ffff3899710 (LWP 28099) exited]
[Thread 0x7ffff409a710 (LWP 28098) exited]
[New Thread 0x7ffff409a710 (LWP 28191)]
[New Thread 0x7ffff3899710 (LWP 28192)]
[New Thread 0x7ffff1689710 (LWP 28193)]
[New Thread 0x7ffff0e88710 (LWP 28194)]
[New Thread 0x7fffebfff710 (LWP 28195)]
[Thread 0x7ffff1689710 (LWP 28193) exited]
[Thread 0x7ffff409a710 (LWP 28191) exited]
[Thread 0x7ffff3899710 (LWP 28192) exited]
[Thread 0x7fffebfff710 (LWP 28195) exited]
[New Thread 0x7fffebfff710 (LWP 28196)]
[New Thread 0x7ffff3899710 (LWP 28197)]
[New Thread 0x7ffff409a710 (LWP 28198)]
[New Thread 0x7ffff1689710 (LWP 28199)]
[New Thread 0x7fffeb7fe710 (LWP 28200)]
[New Thread 0x7fffeaffd710 (LWP 28201)]
[Thread 0x7fffeb7fe710 (LWP 28200) exited]
[Thread 0x7ffff1689710 (LWP 28199) exited]
[Thread 0x7fffeaffd710 (LWP 28201) exited]
[New Thread 0x7fffeaffd710 (LWP 28203)]
[New Thread 0x7ffff1689710 (LWP 28204)]
[Thread 0x7ffff268b710 (LWP 28189) exited]
[New Thread 0x7ffff268b710 (LWP 28205)]
[Thread 0x7ffff1689710 (LWP 28204) exited]
[Thread 0x7ffff2e8c710 (LWP 28188) exited]
[New Thread 0x7ffff2e8c710 (LWP 28206)]
[Thread 0x7ffff2e8c710 (LWP 28206) exited]
[New Thread 0x7ffff2e8c710 (LWP 28208)]
[New Thread 0x7ffff1689710 (LWP 28209)]
[Thread 0x7ffff1689710 (LWP 28209) exited]
[Thread 0x7fffeaffd710 (LWP 28203) exited]
[New Thread 0x7fffeaffd710 (LWP 28210)]
[Thread 0x7fffeaffd710 (LWP 28210) exited]
[New Thread 0x7fffeaffd710 (LWP 28211)]
[Thread 0x7ffff0e88710 (LWP 28194) exited]
[New Thread 0x7ffff0e88710 (LWP 28212)]
[Thread 0x7ffff0e88710 (LWP 28212) exited]
[New Thread 0x7ffff0e88710 (LWP 28213)]
[New Thread 0x7ffff1689710 (LWP 28214)]
[Thread 0x7ffff1689710 (LWP 28214) exited]
[New Thread 0x7ffff1689710 (LWP 28217)]
[Thread 0x7fffebfff710 (LWP 28196) exited]
[New Thread 0x7fffebfff710 (LWP 28218)]
[New Thread 0x7fffeb7fe710 (LWP 28219)]
[Thread 0x7ffff268b710 (LWP 28205) exited]
[Thread 0x7fffeb7fe710 (LWP 28219) exited]
[New Thread 0x7fffeb7fe710 (LWP 28220)]
[New Thread 0x7ffff268b710 (LWP 28221)]
[Thread 0x7ffff268b710 (LWP 28221) exited]
[Thread 0x7fffeaffd710 (LWP 28211) exited]
[New Thread 0x7fffeaffd710 (LWP 28222)]
[Thread 0x7ffff0e88710 (LWP 28213) exited]
[New Thread 0x7ffff0e88710 (LWP 28223)]
[Thread 0x7ffff0e88710 (LWP 28223) exited]
[New Thread 0x7ffff0e88710 (LWP 28224)]
[New Thread 0x7ffff268b710 (LWP 28225)]
[New Thread 0x7fffea7fc710 (LWP 28226)]
[Thread 0x7ffff268b710 (LWP 28225) exited]
[Thread 0x7ffff0e88710 (LWP 28224) exited]
[New Thread 0x7ffff0e88710 (LWP 28227)]
[New Thread 0x7ffff268b710 (LWP 28228)]
[New Thread 0x7fffe9ffb710 (LWP 28229)]
[Thread 0x7ffff0e88710 (LWP 28227) exited]
[Thread 0x7fffea7fc710 (LWP 28226) exited]
[Thread 0x7fffe9ffb710 (LWP 28229) exited]

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7ffff1e8a710 (LWP 28190)]
0x00007ffff67d305d in write () from /lib/libpthread.so.0
(gdb) thread apply all bt

Thread 43 (Thread 0x7ffff268b710 (LWP 28228)):
#0  0x00007ffff67d30bd in read () from /lib/libpthread.so.0
#1  0x00007ffff7172d56 in read_nbytes (bsock=0x1015dc8, ptr=0x7ffff268a77c "\377\377\377\377", nbytes=4) at bnet.c:80
#2  0x00007ffff7175bd7 in BSOCK::recv (this=0x1015dc8) at bsock.c:451
#3  0x00007ffff7171fda in bget_msg (sock=0x1015dc8) at bget_msg.c:60
#4  0x00000000004104b9 in do_append_data (jcr=0x6aa558) at append.c:151
#5  0x00000000004249fb in append_data_cmd (jcr=0x6aa558) at fd_cmds.c:203
#6  0x00000000004241bb in do_fd_commands (jcr=0x6aa558) at fd_cmds.c:162
#7  0x0000000000424b7a in run_job (jcr=0x6aa558) at fd_cmds.c:124
#8  0x000000000042541b in run_cmd (jcr=0x6aa558) at job.c:225
#9  0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#10 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#11 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#12 0x00007ffff538401d in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 37 (Thread 0x7fffeaffd710 (LWP 28222)):
#0  0x00007ffff67d04d9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x000000000042547c in run_cmd (jcr=0x76ed88) at job.c:212
#2  0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#3  0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#4  0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#5  0x00007ffff538401d in clone () from /lib/libc.so.6
#6  0x0000000000000000 in ?? ()

Thread 35 (Thread 0x7fffeb7fe710 (LWP 28220)):
#0  0x00007ffff67d30bd in read () from /lib/libpthread.so.0
#1  0x00007ffff7172d56 in read_nbytes (bsock=0xa13668, 
    ptr=0x7fffec188288 "serJet 1300 PCL 5PrintersHP{4d36e979-e325-11ce-bfc1-08002be10318@\\0001", nbytes=65536) at bnet.c:80
#2  0x00007ffff7175e17 in BSOCK::recv (this=0xa13668) at bsock.c:509
#3  0x00007ffff7171fda in bget_msg (sock=0xa13668) at bget_msg.c:60
#4  0x0000000000410538 in do_append_data (jcr=0x9fcbd8) at append.c:183
#5  0x00000000004249fb in append_data_cmd (jcr=0x9fcbd8) at fd_cmds.c:203
#6  0x00000000004241bb in do_fd_commands (jcr=0x9fcbd8) at fd_cmds.c:162
#7  0x0000000000424b7a in run_job (jcr=0x9fcbd8) at fd_cmds.c:124
#8  0x000000000042541b in run_cmd (jcr=0x9fcbd8) at job.c:225
#9  0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#10 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#11 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#12 0x00007ffff538401d in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 33 (Thread 0x7fffebfff710 (LWP 28218)):
#0  0x00007ffff67d30bd in read () from /lib/libpthread.so.0
#1  0x00007ffff7172d56 in read_nbytes (bsock=0x76e498, ptr=0x7fffebffe77c "", nbytes=4) at bnet.c:80
#2  0x00007ffff7175bd7 in BSOCK::recv (this=0x76e498) at bsock.c:451
#3  0x00007ffff7171fda in bget_msg (sock=0x76e498) at bget_msg.c:60
#4  0x0000000000410538 in do_append_data (jcr=0x75b468) at append.c:183
#5  0x00000000004249fb in append_data_cmd (jcr=0x75b468) at fd_cmds.c:203
#6  0x00000000004241bb in do_fd_commands (jcr=0x75b468) at fd_cmds.c:162
#7  0x0000000000424b7a in run_job (jcr=0x75b468) at fd_cmds.c:124
#8  0x000000000042541b in run_cmd (jcr=0x75b468) at job.c:225
#9  0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#10 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#11 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#12 0x00007ffff538401d in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 32 (Thread 0x7ffff1689710 (LWP 28217)):
#0  0x00007ffff67d30bd in read () from /lib/libpthread.so.0
#1  0x00007ffff5e93fd1 in ?? () from /usr/lib/libcrypto.so.0.9.8
#2  0x00007ffff5e92279 in BIO_read () from /usr/lib/libcrypto.so.0.9.8
#3  0x00007ffff6189ffd in ssl3_read_n () from /usr/lib/libssl.so.0.9.8
---Type <return> to continue, or q <return> to quit---
#4  0x00007ffff618a443 in ssl3_read_bytes () from /usr/lib/libssl.so.0.9.8
#5  0x00007ffff6186fbc in ssl3_shutdown () from /usr/lib/libssl.so.0.9.8
#6  0x00007ffff71922ac in tls_bsock_shutdown (bsock=0xb40448) at tls.c:578
#7  0x00007ffff717531f in BSOCK::close (this=0xb40448) at bsock.c:889
#8  0x00007ffff717f883 in free_common_jcr (file=<value optimized out>, line=<value optimized out>, jcr=0xb40808) at jcr.c:443
#9  b_free_jcr (file=<value optimized out>, line=<value optimized out>, jcr=0xb40808) at jcr.c:570
#10 0x000000000041f711 in cancel_cmd (cjcr=<value optimized out>) at dircmd.c:337
#11 0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#12 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#13 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#14 0x00007ffff538401d in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7ffff2e8c710 (LWP 28208)):
#0  0x00007ffff67d30bd in read () from /lib/libpthread.so.0
#1  0x00007ffff7172d56 in read_nbytes (bsock=0x774378, ptr=0x7ffff2e8b77c "\377\377\377\377P", <incomplete sequence \366\274>, 
    nbytes=4) at bnet.c:80
#2  0x00007ffff7175bd7 in BSOCK::recv (this=0x774378) at bsock.c:451
#3  0x00007ffff7171fda in bget_msg (sock=0x774378) at bget_msg.c:60
#4  0x00000000004104b9 in do_append_data (jcr=0x745ec8) at append.c:151
#5  0x00000000004249fb in append_data_cmd (jcr=0x745ec8) at fd_cmds.c:203
#6  0x00000000004241bb in do_fd_commands (jcr=0x745ec8) at fd_cmds.c:162
#7  0x0000000000424b7a in run_job (jcr=0x745ec8) at fd_cmds.c:124
#8  0x000000000042541b in run_cmd (jcr=0x745ec8) at job.c:225
#9  0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#10 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#11 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#12 0x00007ffff538401d in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 17 (Thread 0x7ffff409a710 (LWP 28198)):
#0  0x00007ffff537d8b3 in select () from /lib/libc.so.6
#1  0x00007ffff71921df in openssl_bsock_readwrite (bsock=0xbf4008, ptr=0x7ffff409977c "\263\001", nbytes=4) at tls.c:642
#2  tls_bsock_readn (bsock=0xbf4008, ptr=0x7ffff409977c "\263\001", nbytes=4) at tls.c:693
#3  0x00007ffff7175bd7 in BSOCK::recv (this=0xbf4008) at bsock.c:451
#4  0x00007ffff7171fda in bget_msg (sock=0xbf4008) at bget_msg.c:60
#5  0x0000000000410538 in do_append_data (jcr=0xbf3668) at append.c:183
#6  0x00000000004249fb in append_data_cmd (jcr=0xbf3668) at fd_cmds.c:203
#7  0x00000000004241bb in do_fd_commands (jcr=0xbf3668) at fd_cmds.c:162
#8  0x0000000000424b7a in run_job (jcr=0xbf3668) at fd_cmds.c:124
#9  0x000000000042541b in run_cmd (jcr=0xbf3668) at job.c:225
#10 0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#11 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#12 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#13 0x00007ffff538401d in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7ffff3899710 (LWP 28197)):
#0  0x00007ffff537d8b3 in select () from /lib/libc.so.6
#1  0x00007ffff71921df in openssl_bsock_readwrite (bsock=0xc03938, ptr=0x7fffec17110c "\f\211P\b\213Eԁx\004\377\001", 
    nbytes=49156) at tls.c:642
#2  tls_bsock_readn (bsock=0xc03938, ptr=0x7fffec17110c "\f\211P\b\213Eԁx\004\377\001", nbytes=49156) at tls.c:693
#3  0x00007ffff7175e17 in BSOCK::recv (this=0xc03938) at bsock.c:509
#4  0x00007ffff7171fda in bget_msg (sock=0xc03938) at bget_msg.c:60
#5  0x0000000000410538 in do_append_data (jcr=0xbf2428) at append.c:183
#6  0x00000000004249fb in append_data_cmd (jcr=0xbf2428) at fd_cmds.c:203
#7  0x00000000004241bb in do_fd_commands (jcr=0xbf2428) at fd_cmds.c:162
#8  0x0000000000424b7a in run_job (jcr=0xbf2428) at fd_cmds.c:124
#9  0x000000000042541b in run_cmd (jcr=0xbf2428) at job.c:225
#10 0x000000000042169f in handle_connection_request (arg=<value optimized out>) at dircmd.c:233
#11 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#12 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#13 0x00007ffff538401d in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 9 (Thread 0x7ffff1e8a710 (LWP 28190)):
#0  0x00007ffff67d305d in write () from /lib/libpthread.so.0
#1  0x00007ffff5e94065 in ?? () from /usr/lib/libcrypto.so.0.9.8
#2  0x00007ffff5e92167 in BIO_write () from /usr/lib/libcrypto.so.0.9.8
#3  0x00007ffff61896a8 in ssl3_write_pending () from /usr/lib/libssl.so.0.9.8
#4  0x00007ffff6189bf3 in ssl3_dispatch_alert () from /usr/lib/libssl.so.0.9.8
#5  0x00007ffff6186f94 in ssl3_shutdown () from /usr/lib/libssl.so.0.9.8
#6  0x00007ffff7192227 in tls_bsock_shutdown (bsock=0x74b9c8) at tls.c:573
#7  0x00007ffff717531f in BSOCK::close (this=0x74b9c8) at bsock.c:889
#8  0x00007ffff717f883 in free_common_jcr (file=<value optimized out>, line=<value optimized out>, jcr=0x74c158) at jcr.c:443
#9  b_free_jcr (file=<value optimized out>, line=<value optimized out>, jcr=0x74c158) at jcr.c:570
#10 0x00000000004215b8 in handle_connection_request (arg=<value optimized out>) at dircmd.c:252
#11 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#12 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#13 0x00007ffff538401d in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7ffff509c710 (LWP 28187)):
#0  0x00007ffff67d30bd in read () from /lib/libpthread.so.0
#1  0x00007ffff5e93fd1 in ?? () from /usr/lib/libcrypto.so.0.9.8
#2  0x00007ffff5e92279 in BIO_read () from /usr/lib/libcrypto.so.0.9.8
#3  0x00007ffff6189ffd in ssl3_read_n () from /usr/lib/libssl.so.0.9.8
#4  0x00007ffff618a5bf in ssl3_read_bytes () from /usr/lib/libssl.so.0.9.8
#5  0x00007ffff6186fbc in ssl3_shutdown () from /usr/lib/libssl.so.0.9.8
#6  0x00007ffff71922ac in tls_bsock_shutdown (bsock=0x75c0f8) at tls.c:578
#7  0x00007ffff717531f in BSOCK::close (this=0x75c0f8) at bsock.c:889
#8  0x0000000000424c45 in stored_free_jcr (jcr=0x679688) at job.c:367
#9  0x00007ffff717f78f in b_free_jcr (file=<value optimized out>, line=<value optimized out>, jcr=0x679688) at jcr.c:567
#10 0x00000000004215b8 in handle_connection_request (arg=<value optimized out>) at dircmd.c:252
#11 0x00007ffff719a619 in workq_server (arg=<value optimized out>) at workq.c:346
#12 0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#13 0x00007ffff538401d in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7ffff489b710 (LWP 28088)):
#0  0x00007ffff67d04d9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00007ffff719a18c in watchdog_thread (arg=<value optimized out>) at watchdog.c:308
#2  0x00007ffff67cb8ba in start_thread () from /lib/libpthread.so.0
#3  0x00007ffff538401d in clone () from /lib/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7fe6720 (LWP 28083)):
#0  0x00007ffff537d8b3 in select () from /lib/libc.so.6
#1  0x00007ffff71739e1 in bnet_thread_server (addrs=<value optimized out>, max_clients=<value optimized out>, 
    client_wq=<value optimized out>, handle_client_request=<value optimized out>) at bnet_server.c:161
#2  0x0000000000408ac2 in main (argc=<value optimized out>, argv=<value optimized out>) at stored.c:312
(gdb) 

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users