Bacula-users

Re: [Bacula-users] bacula-fd crashes on FreeBSD 9.2

2013-10-27 14:35:09
Subject: Re: [Bacula-users] bacula-fd crashes on FreeBSD 9.2
From: Dan Langille <dan AT langille DOT org>
To: David Newman <dnewman AT networktest DOT com>
Date: Sun, 27 Oct 2013 14:31:51 -0400
On Oct 22, 2013, at 3:00 PM, David Newman wrote:

> 
> 
> On 10/19/13 11:40 PM, Kern Sibbald wrote:
>> Hello,
>> 
>> From what I can see -- first "signal 0", and second this
>> traceback, this looks a lot like a FreeBSD pthreads bug.
>> 
>> First because there is no such thing, at least in userland,
>> as a signal number 0, which I saw in an earlier
>> email.  Second, as the traceback below
>> shows, Bacula is waiting on a pthread_cond_timedwait() and
>> while in the pthread_cond_timedwait, which is a "system"
>> subroutine, it emits a pthread_cond_signal(), probably no
>> problem, followed by a pthread_kill().  That seems odd to
>> me, but perhaps it is how FreeBSD does it, but the net
>> result is that it is killing Bacula.
>> 
>> Obviously, this could be a Bacula bug, but it is not occurring
>> elsewhere, and it looks very suspicious to me.
>> 
>> You can get more information by compiling with
>> #define DEVELOPER 1
>> in <bacula>/src/version.h  and ensuring that the -g
>> option is on the compile and that the binaries are not
>> stripped (default for Bacula Makefiles, but not for the
>> FreeBSD ports system).
>> 
>> Then if you get another traceback, it may be clearer what
>> is going on.  Since this is relatively serious, I would recommend
>> running Bacula under the debugger directly, see the manual on
>> the details of how, then when the debugger gets control after
>> the signal, manually do the "thread apply all bt" command.
> 
> FreeBSD gurus, a little help?

That's not me.

> I don't see version.h under the bacula-client port directory.

try this:

make clean
make patch
find . -name version.h
./bacula-5.2.12/src/version.h

> 
> Also, I do have bacula-fd running fine on other FreeBSD 9.2 systems. The
> only delta AFAIK is that this is an i386 system and the others are amd64.
> 
> To review:
> 
> 1. Backup jobs complete when manually starting bacula-fd.

What command are you entering?

> 
> 2. Backup jobs do not complete when launching bacula-fd via the startup
> script in /usr/local/etc/rc.d/bacula-fd.

For example: usr/local/etc/rc.d/bacula-fd start ?

> 
> Thanks in advance for further debugging clues.
> 
> dn
> 
> 
> 
>> 
>> If any of you are FreeBSD system gurus you might compare the
>> last known working version of the OS with 9.2, particularly the
>> pthreads routines.  Perhaps they are using a signal 0 internally,
>> and somehow that leaked back to Bacula.
>> 
>> Best regards,
>> Kern
>> 
>> On 10/18/2013 01:29 AM, David Newman wrote:
>>> On 10/17/13 5:33 AM, Martin Simmons wrote:
>>>>>>>>> On Wed, 16 Oct 2013 12:13:26 -0700, David Newman said:
>>>>> On 10/14/13 2:44 AM, Martin Simmons wrote:
>>>>>>>>>>> On Sun, 13 Oct 2013 18:25:07 -0700, David Newman said:
>>>>>>> On 10/9/13 4:41 PM, David Newman wrote:
>>>>>>>> FreeBSD 9.2-RELEASE, bacula-client-5.2.12_3 installed from ports
>>>>>>>> 
>>>>>>>> Ever since upgrading this host to FreeBSD 9.2, bacula-fd crashes
>>>>>>>> as soon
>>>>>>>> as bacula-dir starts a backup job. The entry in /var/log/messages
>>>>>>>> is:
>>>>>>>> 
>>>>>>>> Oct  9 16:25:50 o bacula-fd: Bacula interrupted by signal 0:
>>>>>>>> UNKNOWN SIGNAL
>>>>>>>> 
>>>>>>>> Backups worked fine on this host running FreeBSD 9.1 and other hosts
>>>>>>>> upgraded to FreeBSD 9.2 run backups OK.
>>>>>>>> 
>>>>>>>> I've done the uninstall/reinstall thing with the bacula-client
>>>>>>>> port, but
>>>>>>>> that made no difference.
>>>>>>>> 
>>>>>>>> Thanks in advance for troubleshooting clues.
>>>>>>>> 
>>>>>>>> dn
>>>>>>> Is there a Wireshark decode for Bacula?
>>>>>>> 
>>>>>>> I'm still stuck on this problem, and need more info on what's causing
>>>>>>> that UNKNOWN SIGNAL error. Wireshark 1.8.6 just shows strings of
>>>>>>> bytes
>>>>>>> for the Bacula stuff.
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> dn
>>>>>> A wireshark decode won't help much here because problems like this
>>>>>> must be in
>>>>>> the fd itself.
>>>>>> 
>>>>>> Try attaching gdb to the bacula-fd process and see if it catches the
>>>>>> mysterious signal (see
>>>>>> http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION00640000000000000000).
>>>>>> 
>>>>> No luck with this. Per that URL, I've put the btraceback.gdb file in
>>>>> the
>>>>> same directory as the bacula-fd executable on the client (in this case,
>>>>> /usr/local/sbin) and made the .gdb file executable.
>>>>> 
>>>>> At run time it produces this error:
>>>>> 
>>>>> /usr/local/sbin/btraceback.gdb:1: Error in sourced command file:
>>>>> No symbol table is loaded.  Use the "file" command.
>>>>> 
>>>>> That's problem 1. Problem 2 is that the syntax given for capturing
>>>>> STDERR and STDOUT -- 2>\&1 -- doesn't work on either csh (root's
>>>>> default
>>>>> on FreeBSD) or bash.
>>>>> 
>>>>> Any ideas on remedying either issue?
>>>> It looks like you missed the part after the # in the URL -- you don't
>>>> need the
>>>> btraceback.gdb file.
>>>> 
>>>> The section I meant is called "Manually Running Bacula Under The
>>>> Debugger" on
>>>> that page (you'll have to adapt it for the bacula-fd).
>>> Sorry for missing that.
>>> 
>>> The backup runs fine under the debugger, including the backup job
>>> beforehand, but not with the FreeBSD startup script in
>>> /usr/local/etc/rc.d.
>>> 
>>> I've pasted below the debugger output and the startup script.
>>> 
>>> Thanks in advance for further troubleshooting clues.
>>> 
>>> dn
>>> 
>>> 
>>> ==========
>>> 
>>> Successful run, via /usr/local/sbin/bacula-fd run via gdb:
>>> 
>>> (gdb) thread apply all bt
>>> Thread 5 (Thread 28c08b00 (LWP 100213/bacula-fd)):
>>> #0  0x282302b3 in pthread_kill () from /lib/libthr.so.3
>>> #1  0x2822f9b2 in pthread_kill () from /lib/libthr.so.3
>>> #2  0x282328f9 in pthread_cond_signal () from /lib/libthr.so.3
>>> #3  0x281f5d20 in bthread_cond_timedwait_p () from
>>> /usr/local/lib/libbac.so.5
>>> #4  0x281ef9b0 in watchdog_thread () from /usr/local/lib/libbac.so.5
>>> #5  0x281f7167 in lmgr_thread_launcher () from /usr/local/lib/libbac.so.5
>>> #6  0x28227f3a in pthread_getprio () from /lib/libthr.so.3
>>> #7  0x00000000 in ?? ()
>>> 
>>> Thread 3 (Thread 28805e00 (LWP 100211/bacula-fd)):
>>> #0  0x28624323 in nanosleep () from /lib/libc.so.7
>>> #1  0x2822ad8b in nanosleep () from /lib/libthr.so.3
>>> #2  0x281c1a90 in bmicrosleep () from /usr/local/lib/libbac.so.5
>>> #3  0x281f7349 in check_deadlock () from /usr/local/lib/libbac.so.5
>>> #4  0x28227f3a in pthread_getprio () from /lib/libthr.so.3
>>> #5  0x00000000 in ?? ()
>>> 
>>> Thread 2 (Thread 28804300 (LWP 100133/bacula-fd)):
>>> #0  0x28646103 in select () from /lib/libc.so.7
>>> #1  0x2822a960 in select () from /lib/libthr.so.3
>>> #2  0x281c45a8 in bnet_thread_server () from /usr/local/lib/libbac.so.5
>>> #3  0x0804f5c6 in main ()
>>> #0  0x282302b3 in pthread_kill () from /lib/libthr.so.3
>>> 
>>> ==========
>>> 
>>> FreeBSD startup script:
>>> 
>>> #!/bin/sh
>>> #
>>> # $FreeBSD: sysutils/bacula-server/files/bacula-fd.in 323275 2013-07-19
>>> 09:44:58Z rm $
>>> #
>>> # PROVIDE: bacula_fd
>>> # REQUIRE: DAEMON
>>> # KEYWORD: shutdown
>>> #
>>> # Add the following lines to /etc/rc.conf.local or /etc/rc.conf
>>> # to enable this service:
>>> #
>>> # bacula_fd_enable  (bool):  Set to NO by default.
>>> #               Set it to YES to enable bacula_fd.
>>> # bacula_fd_flags (params):  Set params used to start bacula_fd.
>>> #
>>> 
>>> . /etc/rc.subr
>>> 
>>> name="bacula_fd"
>>> rcvar=${name}_enable
>>> command=/usr/local/sbin/bacula-fd
>>> 
>>> load_rc_config $name
>>> 
>>> : ${bacula_fd_enable="NO"}
>>> : ${bacula_fd_flags=" -u root -g wheel -v -c
>>> /usr/local/etc/bacula/bacula-fd.conf"}
>>> : ${bacula_fd_pidfile="/var/run/bacula-fd.9102.pid"}
>>> 
>>> pidfile="${bacula_fd_pidfile}"
>>> 
>>> run_rc_command "$1"
>>> 
>>> ==========
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>>> Thanks.
>>>>> 
>>>>> dn
>>>>> 
>>>>> 
>>>>> 
>>>>>> If that doesn't catch it, then try the gdb command
>>>>>> 
>>>>>> break signal_handler
>>>>>> 
>>>>>> (signal_handler prints the "Bacula interrupted by signal" message).
>>>>>> 
>>>>>> __Martin
>>>> 
>>>> __Martin
>>>> 
>>>> ------------------------------------------------------------------------------
>>>> 
>>>> October Webinars: Code for Performance
>>>> Free Intel webinars can help you accelerate application performance.
>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>> most from
>>>> the latest Intel processors and coprocessors. See abstracts and
>>>> register >
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>>> 
>>>> _______________________________________________
>>>> Bacula-users mailing list
>>>> Bacula-users AT lists.sourceforge DOT net
>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>> 
>>> ------------------------------------------------------------------------------
>>> 
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>> most from
>>> the latest Intel processors and coprocessors. See abstracts and
>>> register >
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>> 
>>> _______________________________________________
>>> Bacula-users mailing list
>>> Bacula-users AT lists.sourceforge DOT net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>> 
>> 
> 
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users

-- 
Dan Langille - http://langille.org


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users