Bacula-users

Re: [Bacula-users] Segmentation fault of Storage Daemon when client is not available

2011-09-21 11:26:40
Subject: Re: [Bacula-users] Segmentation fault of Storage Daemon when client is not available
From: Thomas Lohman <thomasl AT mtl.mit DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 21 Sep 2011 11:23:17 -0400
Just to followup on this in case others have this issue.  I was able to 
rebuild bacula with the -g compiler option to get some debugging 
information.  The scenario that causes the SD to crash with a SEGFAULT 
is not consistently reproducible which makes me think of some kind of 
race condition.  But in any event, I was able to finally get a trace in 
gdb and the crash occurs in the same spot that others have reported in 
the URLs referenced below - namely in the deflate zlib method being 
called from openssl.  The solution, I'm hoping, if you're using TLS, is 
to turn TLS off for communication between the director and the storage 
daemon (and to do this, you want to comment out all of your TLS options 
in any Storage definitions in the Director configuration and just the 
Director definition in the SD configuration).  In addition, I also was 
able to set up the Director so if the SD does die, it would take care of 
restarting it and any failed jobs would be re-queued (using the 
Reschedule on Error options).

thanks again,


--tom


> Hi,
>>
>> We've been seeing our Bacula Storage Daemon die with a segmentation
>> fault when a client can't be reached for backup.  We have two servers
>> and have observed this behavior on both of them.  Some searching has
>> revealed that others seem to have (or had) this same issue.
>>
>> https://bugs.launchpad.net/ubuntu/+source/bacula/+bug/622742
>
> That looks similar to some existing bacula bug reports:
>
> http://bugs.bacula.org/view.php?id=1568
> http://bugs.bacula.org/view.php?id=1343
>
>
>> The behavior is not consistent i.e. sometimes it continues on working
>> normally if a client can't be contacted but eventually it'll snag on one
>> and die.  In addition, I've now had one of our storage daemons running
>> in the foreground with debugging set to the max and of course, that one
>> has now gone two days without seg faulting even though there have been
>> half a dozen non-responsive clients.
>>
>> We're currently running 5.0.3 built from source for both clients and
>> servers.  I'm wondering if anyone else here has experienced this problem
>> and/or has any pointers to a work around.  While things can be set up to
>> automatically restart the storage daemon if it dies, the main problem is
>> that any backups Bacula was in the middle of doing end with an error and
>> have to be manually rescheduled/run or just wait until the next time
>> their job comes up to run.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users