Bacula-users

Re: [Bacula-users] [Bacula-devel] Storage Daemon crash backtrace

2010-07-02 00:05:42
Subject: Re: [Bacula-users] [Bacula-devel] Storage Daemon crash backtrace
From: Robert LeBlanc <robert AT leblancnet DOT us>
To: Kern Sibbald <kern AT sibbald DOT com>
Date: Thu, 1 Jul 2010 22:02:10 -0600
On Wed, Jun 30, 2010 at 8:35 AM, Robert LeBlanc <robert AT leblancnet DOT us> wrote:
On Wed, Jun 30, 2010 at 1:06 AM, Kern Sibbald <kern AT sibbald DOT com> wrote:

This seems to a support issue. �The dump that you posted shows no indication
of a crash, which means that your understanding of a crash an mine are
different.

This is possibly a deadlock, but I won't spend any more time on it until the
problem is a bit clearer.

Best regards,

Kern

By the way, if this is a production system, you should be running on Lenny,
which is known to be stable, and we support it.

I'm not really sure what you need as a good backtrace, since I'm not a programmer. I always thought that segfault lead to a program crashing. I just don't know enough about gdb to know when there is enough information. All I know is that when it crashes when running as a daemon, I get a traceback that is useless in my e-mail (says no ptrace). When I run it under gdb and get the segfault, when I type 'cont' it says that bacula-sd has exited, and when I run it again, it doesn't complain that a process is already running. In both cases, there is no process called bacula-sd running on the system.

I updated/upgraded about 10 clients yesterday to using TLS, and I did not get a crash from the SD. I will keep running it under the debugger in case it crashes again, although, I'm not sure how useful it will be if I can not operate gdb correctly to get you anything helpful. I have a feeling it's some perfect storm of configuration that may be causing the issue. I've been running Bacula for 6 years and never have had a problem like this. I'm just trying to help the project be as robust as possible because we like it and it has treated us so well in the past.

As a side note, I get a lot more connection timeouts and broken pipes when using TLS, adding heartbeat interval helps, but it is not a silver bullet. Most of the back-ups are succeeding with only a few here and there having problems. Not using TLS and not�having�heartbeat interval, the back-ups aways succeed. I'll keep working through things and see if I can come up with anything.

Thank you for the time and the great project.


Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University

P.S. We are working on a support contract and will be talking with you in about 24 hours with many others from our group who are also interested in using Bacula.

I know you are probably getting tired of hearing from me, but I had another crash today. I'm attaching the backtrace that I got this time. I typed 'cont' after the backtrace and all it said was that all the threads exited (this is in the log this time). Here is what was before the back trace:

[Thread 0x7fffebfff710 (LWP 25670) exited]
[New Thread 0x7fffebfff710 (LWP 25671)]
[Thread 0x7fffebfff710 (LWP 25671) exited]
[Thread 0x7ffff0e88710 (LWP 24428) exited]
[Thread 0x7ffff1e8a710 (LWP 25530) exited]
[Thread 0x7ffff2e8c710 (LWP 25663) exited]
[New Thread 0x7ffff2e8c710 (LWP 25785)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2e8c710 (LWP 25785)]
0x00007ffff77c5b1c in ?? () from /usr/lib/libz.so.1
(gdb) set loggin file /home/rleblanc/bacula-sd-seg.log
(gdb) set logging on
Copying output to /home/rleblanc/bacula-sd-seg.log.
(gdb) thread apply all bt

Thread 219 (Thread 0x7ffff2e8c710 (LWP 25785)):
#0 �0x00007ffff77c5b1c in ?? () from /usr/lib/libz.so.1
#1 �0x00007ffff77c6ef7 in ?? () from /usr/lib/libz.so.1
#2 �0x00007ffff77c40eb in ?? () from /usr/lib/libz.so.1
#3 �0x00007ffff77c2251 in deflate () from /usr/lib/libz.so.1
#4 �0x00007ffff5eea6f2 in ?? () from /usr/lib/libcrypto.so.0.9.8

The question that I have is am I missing some debug symbols in other packages like open-ssl that would help? I'm not a programmer so backtraces are pretty much a wall of text to me. I want to give helpful info so that others may not run into the same problem into the future.

If this is not helpful, I'm not sure what else to do, so I'll give up and just create a cron job that will restart bacula-sd if it crashes or modify btraceback to restart bacula-sd.

Thanks,

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University

Attachment: bacula-sd-seg.log
Description: Text Data

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users