Just to followup on this in case others have this issue. I was able to
rebuild bacula with the -g compiler option to get some debugging
information. The scenario that causes the SD to crash with a SEGFAULT
is not consistently reproducible which makes me think of some kind of
race condition. But in any event, I was able to finally get a trace in
gdb and the crash occurs in the same spot that others have reported in
the URLs referenced below - namely in the deflate zlib method being
called from openssl. The solution, I'm hoping, if you're using TLS, is
to turn TLS off for communication between the director and the storage
daemon (and to do this, you want to comment out all of your TLS options
in any Storage definitions in the Director configuration and just the
Director definition in the SD configuration). In addition, I also was
able to set up the Director so if the SD does die, it would take care of
restarting it and any failed jobs would be re-queued (using the
Reschedule on Error options).
thanks again,
--tom
> Hi,
>>
>> We've been seeing our Bacula Storage Daemon die with a segmentation
>> fault when a client can't be reached for backup. We have two servers
>> and have observed this behavior on both of them. Some searching has
>> revealed that others seem to have (or had) this same issue.
>>
>> https://bugs.launchpad.net/ubuntu/+source/bacula/+bug/622742
>
> That looks similar to some existing bacula bug reports:
>
> http://bugs.bacula.org/view.php?id=1568
> http://bugs.bacula.org/view.php?id=1343
>
>
>> The behavior is not consistent i.e. sometimes it continues on working
>> normally if a client can't be contacted but eventually it'll snag on one
>> and die. In addition, I've now had one of our storage daemons running
>> in the foreground with debugging set to the max and of course, that one
>> has now gone two days without seg faulting even though there have been
>> half a dozen non-responsive clients.
>>
>> We're currently running 5.0.3 built from source for both clients and
>> servers. I'm wondering if anyone else here has experienced this problem
>> and/or has any pointers to a work around. While things can be set up to
>> automatically restart the storage daemon if it dies, the main problem is
>> that any backups Bacula was in the middle of doing end with an error and
>> have to be manually rescheduled/run or just wait until the next time
>> their job comes up to run.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|