Bacula-users

Re: [Bacula-users] Bacula-dir crashes when resources high

2014-05-10 02:39:59
Subject: Re: [Bacula-users] Bacula-dir crashes when resources high
From: Kern Sibbald <kern AT sibbald DOT com>
To: Marco Nicolayevsky <marco AT specialtyvalvegroup DOT com>, "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Sat, 10 May 2014 08:36:51 +0200
Hello,

A priori this has nothing to do with high resources, unless there is some sort of unknown Bacula memory corruption problem (unlikely at this point).  This is clearly a segfault.  However, it does not make any sense.

I think you should submit this as a bug report, but with the current information, I have already examined everything I can and run hundreds of tests trying to reproduce it, so, to get to the bottom of it, I will need (for starters):

1. A bug report, with the traceback attached.
2. Can you reproduce it?
3. All the commands (bconsole) that you entered including and prior to the "cancel" and the "2".
4. The options you used to build Bacula.
5. What subversion of CentOS 6 are you using?

Best regards,
Kern

On 05/08/2014 04:30 PM, Marco Nicolayevsky wrote:

Good morning.

 

I’ve got Bacula 7.0.2 on CENTOS6 and when doing a long backup, resources can be really high (load average 10+). Trying to cancel a job using bconsole, bacual-dir crashed with the following trace.

 

Can anyone give me guidance to see if there is anything I can do and/or change on my system to prevent these crashes? See trace below.

 

Thanks,

 

Marco

 

 

 

 

 

[New LWP 20711]

[New LWP 29898]

[New LWP 29657]

[New LWP 29655]

[New LWP 29645]

[New LWP 29644]

[New LWP 2199]

[New LWP 2198]

[New LWP 2138]

[Thread debugging using libthread_db enabled]

0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

$1 = '\000' <repeats 29 times>

$2 = 0x95f068 "bacula-dir"

$3 = 0x95f0a8 "/opt/bacula/bin/bacula-dir"

$4 = 0x7f8cb4025388 "MySQL"

$5 = 0x385c65092c "7.0.2 (02 April 2014)"

$6 = 0x385c65094a "x86_64-redhat-linux-gnu"

$7 = 0x385c650962 "redhat"

$8 = 0x385c6505f5 ""

$9 = "bacula.specialtyvalvegroup.com", '\000' <repeats 19 times>

$10 = 0x385c650942 "redhat "

$11 = 0

Environment variable "TestName" not defined.

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:228

#6  0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at res.c:52

#7  0x0000000000434028 in find_runs (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:300

#8  wait_for_next_job (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:114

#9  0x000000000040eeb5 in main (argc=<value optimized out>, argv=<value optimized out>) at dird.c:336

 

Thread 10 (Thread 0x7f8ccd743700 (LWP 2138)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c6393bd in smalloc (fname=0x385c65501c "lockmgr.c", lineno=603, nbytes=65) at smartall.c:114

#5  0x000000385c6395f5 in sm_malloc (fname=<value optimized out>, lineno=<value optimized out>, nbytes=24) at smartall.c:236

#6  0x000000385c648793 in operator new () at ../lib/smartall.h:105

#7  lmgr_detect_deadlock_unlocked () at lockmgr.c:603

#8  0x000000385c6495dd in lmgr_detect_deadlock () at lockmgr.c:661

#9  0x000000385c649b82 in check_deadlock () at lockmgr.c:717

#10 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#11 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 9 (Thread 0x7f8cccd42700 (LWP 2198)):

#0  0x0000003a3eae15e3 in select () from /lib64/libc.so.6

#1  0x000000385c618e44 in bnet_thread_server (addr_list=0x7f8cccd42688, max_clients=20, client_wq=0x685c40, handle_client_request=0x452f20 <handle_UA_client_request(void*)>) at bnet_server.c:168

#2  0x0000000000452f1c in connect_thread (arg=0x9632d8) at ua_server.c:69

#3  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#4  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#5  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 8 (Thread 0x7f8cc7fff700 (LWP 2199)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c6485d6 in bthread_cond_timedwait_p (cond=<value optimized out>, m=<value optimized out>, abstime=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at lockmgr.c:977

#6  0x000000385c64279d in watchdog_thread (arg=<value optimized out>) at watchdog.c:309

#7  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#8  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#9  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 7 (Thread 0x7f8cc6bfd700 (LWP 29644)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cb80115b8) at getmsg.c:124

#4  0x00000000004116f5 in wait_for_job_termination (jcr=0x9658d8, timeout=<value optimized out>) at backup.c:630

#5  0x00000000004138ee in do_backup (jcr=0x9658d8) at backup.c:581

#6  0x0000000000427019 in job_thread (arg=0x9658d8) at job.c:303

#7  0x0000000000428293 in jobq_server (arg=0x685940) at jobq.c:439

#8  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#9  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#10 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 6 (Thread 0x7f8cc5ff6700 (LWP 29645)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cac012da8) at getmsg.c:124

#4  0x00000000004116f5 in wait_for_job_termination (jcr=0x96eeb8, timeout=<value optimized out>) at backup.c:630

#5  0x00000000004138ee in do_backup (jcr=0x96eeb8) at backup.c:581

#6  0x0000000000427019 in job_thread (arg=0x96eeb8) at job.c:303

#7  0x0000000000428293 in jobq_server (arg=0x685940) at jobq.c:439

#8  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#9  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#10 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 5 (Thread 0x7f8ca97fb700 (LWP 29655)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cb800b128) at getmsg.c:124

#4  0x000000000042e1aa in msg_thread (arg=0x9658d8) at msgchan.c:427

#5  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#6  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#7  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 4 (Thread 0x7f8c8ffff700 (LWP 29657)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cac00b2d8) at getmsg.c:124

#4  0x000000000042e1aa in msg_thread (arg=0x96eeb8) at msgchan.c:427

#5  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#6  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#7  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 3 (Thread 0x7f8cc75fe700 (LWP 29898)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c649024 in bthread_mutex_lock_p (m=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at lockmgr.c:932

#6  0x0000000000428870 in jobq_server (arg=0x685940) at jobq.c:589

#7  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#8  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#9  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 2 (Thread 0x7f8cab5fe700 (LWP 20711)):

#0  0x0000003a3ee0f2ad in waitpid () from /lib64/libpthread.so.0

#1  0x000000385c638712 in signal_handler (sig=<value optimized out>) at signal.c:234

#2  <signal handler called>

#3  sm_free (file=0x385c651eb2 "sellist.c", line=138, fp=0x3800000000) at smartall.c:180

#4  0x000000385c637969 in sellist::set_string (this=0x7f8cab5fd190, string=0x7f8cb4015850 "2", scan=true) at sellist.c:138

#5  0x000000000043d670 in get_selection_list (ua=0x7f8c9800b998, sl=..., prompt=<value optimized out>, subprompt=<value optimized out>) at ua_input.c:89

#6  0x0000000000452127 in do_alist_prompt (ua=0x7f8c9800b998, automsg=<value optimized out>, msg=0x7f8cab5fda20 "Choose Job list to cancel", selected=0x7f8c9800b3f8) at ua_select.c:957

#7  0x0000000000452af8 in select_running_jobs (ua=0x7f8c9800b998, jcrs=0x7f8c9800bcf8, reason=0x4659a6 "cancel") at ua_select.c:1341

#8  0x0000000000439068 in cancel_cmd (ua=0x7f8c9800b998, cmd=<value optimized out>) at ua_cmds.c:443

#9  0x0000000000438bc4 in do_a_command (ua=0x7f8c9800b998) at ua_cmds.c:227

#10 0x0000000000452fce in handle_UA_client_request (arg=0x7f8cc000b0c8) at ua_server.c:133

#11 0x000000385c642ca2 in workq_server (arg=0x685c40) at workq.c:323

#12 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#13 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#14 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 1 (Thread 0x7f8cd35d67e0 (LWP 2132)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:228

#6  0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at res.c:52

#7  0x0000000000434028 in find_runs (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:300

#8  wait_for_next_job (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:114

#9  0x000000000040eeb5 in main (argc=<value optimized out>, argv=<value optimized out>) at dird.c:336

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available.

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0 No symbol table info available.

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available.

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

93           lockmgr.c: No such file or directory.

                in lockmgr.c

errstat = <value optimized out>

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

435         in lockmgr.c

max_prio = <value optimized out>

#5  0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:228

228         rwlock.c: No such file or directory.

                in rwlock.c

stat = 0

#6  0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at res.c:52

52           res.c: No such file or directory.

                in res.c

errstat = <value optimized out>

#7  0x0000000000434028 in find_runs (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:300

300            LockRes();

hour = 7

nh_woy = 19

now = 1399553525

month = 4

wom = 1

ldom = 30

nh_mday = 7

nh_wday = 4

next_hour = 1399557125

sched = <value optimized out>

tm = {tm_sec = 5, tm_min = 52, tm_hour = 8, tm_mday = 8, tm_mon = 4, tm_year = 114, tm_wday = 4, tm_yday = 127, tm_isdst = 1, tm_gmtoff = -18000, tm_zone = 0x999180 "CDT"} wday = 4 mday = 7 woy = 19 nh_hour = 8 nh_month = 4 nh_wom = 1 runtime = <value optimized out> run = <value optimized out> job = <value optimized out> nh_ldom = 30

 



------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce


_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>