Bacula-users

[Bacula-users] Director crash- again with traceback

2011-01-11 12:31:13
Subject: [Bacula-users] Director crash- again with traceback
From: jerry lowry <jlowry AT edt DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 11 Jan 2011 09:12:46 -0800
I really hate when I do that!!!

[?1034h[Thread debugging using libthread_db enabled]
[New Thread 0x7f8362bfd710 (LWP 9002)]
[New Thread 0x7f8363fff710 (LWP 3111)]
[New Thread 0x7f8368c49710 (LWP 3110)]
0x0000003377a0e91d in nanosleep () from /lib64/libpthread.so.0
$1 = '\000' <repeats 29 times>
$2 = 0x1fe2068 "bacula-dir"
$3 = 0x1fe20a8 "/usr/bacula/bin/bacula-dir"
$4 = 0x7f834c004328 "MySQL"
$5 = 0x7f836eadbd9e "5.0.1 (24 February 2010)"
$6 = 0x7f836eadbdb7 "x86_64-unknown-linux-gnu"
$7 = 0x7f836eadbdd0 "redhat"
$8 = 0x7f836eadba7c ""
$9 = "distress", '\000' <repeats 41 times>
#0  0x0000003377a0e91d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x000000000042e1d5 in wait_for_next_job (
    _one_shot_job_to_run_=<value optimized out>) at scheduler.c:131
#3  0x000000000040d93d in main (argc=<value optimized out>, 
    argv=<value optimized out>) at dird.c:338

Thread 4 (Thread 0x7f8368c49710 (LWP 3110)):
#0  0x00000033772d7393 in select () from /lib64/libc.so.6
#1  0x00007f836eab0ad4 in bnet_thread_server (addrs=<value optimized out>, 
    max_clients=<value optimized out>, client_wq=<value optimized out>, 
    handle_client_request=<value optimized out>) at bnet_server.c:161
#2  0x00000000004468fc in connect_thread (arg=0x1fe3ee8) at ua_server.c:82
#3  0x0000003377a06a3a in start_thread () from /lib64/libpthread.so.0
#4  0x00000033772de62d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f8363fff710 (LWP 3111)):
#0  0x0000003377a0b3b9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007f836ead402c in watchdog_thread (arg=<value optimized out>)
    at watchdog.c:308
#2  0x0000003377a06a3a in start_thread () from /lib64/libpthread.so.0
#3  0x00000033772de62d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f8362bfd710 (LWP 9002)):
#0  0x0000003377a0ec8d in waitpid () from /lib64/libpthread.so.0
#1  0x00007f836eacb7ad in signal_handler (sig=11) at signal.c:229
#2  <signal handler called>
#3  0x0000003377a0c280 in pthread_kill () from /lib64/libpthread.so.0
#4  0x0000000000420eba in cancel_storage_daemon_job (jcr=0x7f834c01c2f8)
    at job.c:515
#5  0x0000000000410b50 in wait_for_job_termination (jcr=0x7f834c01c2f8, 
    timeout=<value optimized out>) at backup.c:538
#6  0x00000000004116f0 in do_backup (jcr=0x7f834c01c2f8) at backup.c:456
#7  0x0000000000421fd4 in job_thread (arg=0x7f834c01c2f8) at job.c:314
#8  0x0000000000423624 in jobq_server (arg=0x673b40) at jobq.c:450
#9  0x0000003377a06a3a in start_thread () from /lib64/libpthread.so.0
#10 0x00000033772de62d in clone () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f836ea7b7e0 (LWP 3106)):
#0  0x0000003377a0e91d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x000000000042e1d5 in wait_for_next_job (
    _one_shot_job_to_run_=<value optimized out>) at scheduler.c:131
#3  0x000000000040d93d in main (argc=<value optimized out>, 
    argv=<value optimized out>) at dird.c:338
#0  0x0000003377a0e91d in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f836eaae6f7 in bmicrosleep (sec=60, usec=0) at bsys.c:61
61	   stat = nanosleep(&timeout, NULL);
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 90194313216, tv_usec = 140202474247679}
tz = {tz_minuteswest = 372, tz_dsttime = 0}
stat = <value optimized out>
#2  0x000000000042e1d5 in wait_for_next_job (
    _one_shot_job_to_run_=<value optimized out>) at scheduler.c:131
131	      bmicrosleep(next_check_secs, 0); /* recheck once per minute */
jcr = <value optimized out>
job = <value optimized out>
run = <value optimized out>
now = <value optimized out>
prev = <value optimized out>
first = false
next_job = <value optimized out>
#3  0x000000000040d93d in main (argc=<value optimized out>, 
    argv=<value optimized out>) at dird.c:338
338	   while ( (jcr = wait_for_next_job(runjob)) ) {
jcr = <value optimized out>
test_config = false
ch = <value optimized out>
no_signals = false
uid = 0x0
gid = 0x0
mode = <value optimized out>
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.


-------- Original Message --------
Subject: Director crash
Date: Tue, 11 Jan 2011 09:11:17 -0800
From: jerry lowry <jlowry AT edt DOT com>
To: bacula-users AT lists.sourceforge DOT net


Hi list,

I came in this morning and found that my director had died last night after doing two of the backups.  The traceback follows at the end.
This is the scenario:

    I noticed yesterday that the only two jobs that were scheduled to be performed last night were a monthly backup and the catalog backup.  Given that I did not have the time to research why the other 5 backups were not scheduled I started BAT and selected the jobs to run at the appropriate times they normally run each night ( supposed to anyway ).  So, when I looked at the director status I saw the two that were scheduled and 5 jobs that were waiting for the selected time to run.

The two jobs that were scheduled ran without any errors.  The director crashed when running the first job that I selected to run from BAT.  From BAT
I selected the JOBS tab and then selected the job which I wanted to run.  I modified only the "when" ( or start time ) by highlighting the hour and minute
and inserting the time I wanted the job to run.  Did this for each of the jobs that did not get scheduled. 

Made sure they were all showing up in the DIRECTOR tab and went on home.

Restarted bacula this morning and all the jobs were scheduled as normal.

Any clues or ideas as to the problem would be great.

OS:  Fedora 12 ( 2.6.32.11-99.fc12)
MySQL: 5.1.45 ( source distribution )
Bacula: 5.0.1

--

---------------------------------------------------------------------------
Jerold Lowry
IT Manager / Software Engineer
Engineering Design Team (EDT), Inc. a HEICO company
1400 NW Compton Drive, Suite 315
Beaverton, Oregon 97006 (U.S.A.)
Phone: 503-690-1234 / 800-435-4320
Fax: 503-690-1243
Web:
www.edt.com

 


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>