Bacula-users

[Bacula-users] Bacula-dir crashes when resources high

2014-05-08 10:48:21
Subject: [Bacula-users] Bacula-dir crashes when resources high
From: Marco Nicolayevsky <marco AT specialtyvalvegroup DOT com>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Thu, 8 May 2014 14:30:02 +0000

Good morning.

 

I’ve got Bacula 7.0.2 on CENTOS6 and when doing a long backup, resources can be really high (load average 10+). Trying to cancel a job using bconsole, bacual-dir crashed with the following trace.

 

Can anyone give me guidance to see if there is anything I can do and/or change on my system to prevent these crashes? See trace below.

 

Thanks,

 

Marco

 

 

 

 

 

[New LWP 20711]

[New LWP 29898]

[New LWP 29657]

[New LWP 29655]

[New LWP 29645]

[New LWP 29644]

[New LWP 2199]

[New LWP 2198]

[New LWP 2138]

[Thread debugging using libthread_db enabled]

0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

$1 = '\000' <repeats 29 times>

$2 = 0x95f068 "bacula-dir"

$3 = 0x95f0a8 "/opt/bacula/bin/bacula-dir"

$4 = 0x7f8cb4025388 "MySQL"

$5 = 0x385c65092c "7.0.2 (02 April 2014)"

$6 = 0x385c65094a "x86_64-redhat-linux-gnu"

$7 = 0x385c650962 "redhat"

$8 = 0x385c6505f5 ""

$9 = "bacula.specialtyvalvegroup.com", '\000' <repeats 19 times>

$10 = 0x385c650942 "redhat "

$11 = 0

Environment variable "TestName" not defined.

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:228

#6  0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at res.c:52

#7  0x0000000000434028 in find_runs (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:300

#8  wait_for_next_job (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:114

#9  0x000000000040eeb5 in main (argc=<value optimized out>, argv=<value optimized out>) at dird.c:336

 

Thread 10 (Thread 0x7f8ccd743700 (LWP 2138)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c6393bd in smalloc (fname=0x385c65501c "lockmgr.c", lineno=603, nbytes=65) at smartall.c:114

#5  0x000000385c6395f5 in sm_malloc (fname=<value optimized out>, lineno=<value optimized out>, nbytes=24) at smartall.c:236

#6  0x000000385c648793 in operator new () at ../lib/smartall.h:105

#7  lmgr_detect_deadlock_unlocked () at lockmgr.c:603

#8  0x000000385c6495dd in lmgr_detect_deadlock () at lockmgr.c:661

#9  0x000000385c649b82 in check_deadlock () at lockmgr.c:717

#10 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#11 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 9 (Thread 0x7f8cccd42700 (LWP 2198)):

#0  0x0000003a3eae15e3 in select () from /lib64/libc.so.6

#1  0x000000385c618e44 in bnet_thread_server (addr_list=0x7f8cccd42688, max_clients=20, client_wq=0x685c40, handle_client_request=0x452f20 <handle_UA_client_request(void*)>) at bnet_server.c:168

#2  0x0000000000452f1c in connect_thread (arg=0x9632d8) at ua_server.c:69

#3  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#4  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#5  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 8 (Thread 0x7f8cc7fff700 (LWP 2199)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c6485d6 in bthread_cond_timedwait_p (cond=<value optimized out>, m=<value optimized out>, abstime=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at lockmgr.c:977

#6  0x000000385c64279d in watchdog_thread (arg=<value optimized out>) at watchdog.c:309

#7  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#8  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#9  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 7 (Thread 0x7f8cc6bfd700 (LWP 29644)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cb80115b8) at getmsg.c:124

#4  0x00000000004116f5 in wait_for_job_termination (jcr=0x9658d8, timeout=<value optimized out>) at backup.c:630

#5  0x00000000004138ee in do_backup (jcr=0x9658d8) at backup.c:581

#6  0x0000000000427019 in job_thread (arg=0x9658d8) at job.c:303

#7  0x0000000000428293 in jobq_server (arg=0x685940) at jobq.c:439

#8  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#9  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#10 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 6 (Thread 0x7f8cc5ff6700 (LWP 29645)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cac012da8) at getmsg.c:124

#4  0x00000000004116f5 in wait_for_job_termination (jcr=0x96eeb8, timeout=<value optimized out>) at backup.c:630

#5  0x00000000004138ee in do_backup (jcr=0x96eeb8) at backup.c:581

#6  0x0000000000427019 in job_thread (arg=0x96eeb8) at job.c:303

#7  0x0000000000428293 in jobq_server (arg=0x685940) at jobq.c:439

#8  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#9  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#10 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 5 (Thread 0x7f8ca97fb700 (LWP 29655)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cb800b128) at getmsg.c:124

#4  0x000000000042e1aa in msg_thread (arg=0x9658d8) at msgchan.c:427

#5  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#6  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#7  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 4 (Thread 0x7f8c8ffff700 (LWP 29657)):

#0  0x0000003a3ee0e75d in read () from /lib64/libpthread.so.0

#1  0x000000385c617f66 in read_nbytes (bsock=<value optimized out>, ptr=<value optimized out>, nbytes=<value optimized out>) at bnet.c:69

#2  0x000000385c61b5b0 in BSOCK::recv (this=<value optimized out>) at bsock.c:511

#3  0x0000000000420787 in bget_dirmsg (bs=0x7f8cac00b2d8) at getmsg.c:124

#4  0x000000000042e1aa in msg_thread (arg=0x96eeb8) at msgchan.c:427

#5  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#6  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#7  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 3 (Thread 0x7f8cc75fe700 (LWP 29898)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c649024 in bthread_mutex_lock_p (m=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at lockmgr.c:932

#6  0x0000000000428870 in jobq_server (arg=0x685940) at jobq.c:589

#7  0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#8  0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#9  0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 2 (Thread 0x7f8cab5fe700 (LWP 20711)):

#0  0x0000003a3ee0f2ad in waitpid () from /lib64/libpthread.so.0

#1  0x000000385c638712 in signal_handler (sig=<value optimized out>) at signal.c:234

#2  <signal handler called>

#3  sm_free (file=0x385c651eb2 "sellist.c", line=138, fp=0x3800000000) at smartall.c:180

#4  0x000000385c637969 in sellist::set_string (this=0x7f8cab5fd190, string=0x7f8cb4015850 "2", scan=true) at sellist.c:138

#5  0x000000000043d670 in get_selection_list (ua=0x7f8c9800b998, sl=..., prompt=<value optimized out>, subprompt=<value optimized out>) at ua_input.c:89

#6  0x0000000000452127 in do_alist_prompt (ua=0x7f8c9800b998, automsg=<value optimized out>, msg=0x7f8cab5fda20 "Choose Job list to cancel", selected=0x7f8c9800b3f8) at ua_select.c:957

#7  0x0000000000452af8 in select_running_jobs (ua=0x7f8c9800b998, jcrs=0x7f8c9800bcf8, reason=0x4659a6 "cancel") at ua_select.c:1341

#8  0x0000000000439068 in cancel_cmd (ua=0x7f8c9800b998, cmd=<value optimized out>) at ua_cmds.c:443

#9  0x0000000000438bc4 in do_a_command (ua=0x7f8c9800b998) at ua_cmds.c:227

#10 0x0000000000452fce in handle_UA_client_request (arg=0x7f8cc000b0c8) at ua_server.c:133

#11 0x000000385c642ca2 in workq_server (arg=0x685c40) at workq.c:323

#12 0x000000385c649ac2 in lmgr_thread_launcher (x=<value optimized out>) at lockmgr.c:1091

#13 0x0000003a3ee079d1 in start_thread () from /lib64/libpthread.so.0

#14 0x0000003a3eae8b6d in clone () from /lib64/libc.so.6

 

Thread 1 (Thread 0x7f8cd35d67e0 (LWP 2132)):

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

#5  0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:228

#6  0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at res.c:52

#7  0x0000000000434028 in find_runs (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:300

#8  wait_for_next_job (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:114

#9  0x000000000040eeb5 in main (argc=<value optimized out>, argv=<value optimized out>) at dird.c:336

#0  0x0000003a3ee0e264 in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available.

#1  0x0000003a3ee09508 in _L_lock_854 () from /lib64/libpthread.so.0 No symbol table info available.

#2  0x0000003a3ee093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available.

#3  0x000000385c648f33 in lmgr_p (m=<value optimized out>) at lockmgr.c:93

93           lockmgr.c: No such file or directory.

                in lockmgr.c

errstat = <value optimized out>

#4  0x000000385c64ab6b in lmgr_thread_t::pre_P (this=<value optimized out>, m=<value optimized out>, priority=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at lockmgr.c:435

435         in lockmgr.c

max_prio = <value optimized out>

#5  0x000000385c63649f in rwl_writelock_p (rwl=0x385c40aaa0, file=<value optimized out>, line=<value optimized out>) at rwlock.c:228

228         rwlock.c: No such file or directory.

                in rwlock.c

stat = 0

#6  0x000000385c207ca0 in b_LockRes (file=0x46a070 "scheduler.c", line=300) at res.c:52

52           res.c: No such file or directory.

                in res.c

errstat = <value optimized out>

#7  0x0000000000434028 in find_runs (_one_shot_job_to_run_=<value optimized out>) at scheduler.c:300

300            LockRes();

hour = 7

nh_woy = 19

now = 1399553525

month = 4

wom = 1

ldom = 30

nh_mday = 7

nh_wday = 4

next_hour = 1399557125

sched = <value optimized out>

tm = {tm_sec = 5, tm_min = 52, tm_hour = 8, tm_mday = 8, tm_mon = 4, tm_year = 114, tm_wday = 4, tm_yday = 127, tm_isdst = 1, tm_gmtoff = -18000, tm_zone = 0x999180 "CDT"} wday = 4 mday = 7 woy = 19 nh_hour = 8 nh_month = 4 nh_wom = 1 runtime = <value optimized out> run = <value optimized out> job = <value optimized out> nh_ldom = 30

 

------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>