Bacula-users

[Bacula-users] Bacula Director Keeps Crashing, Backtrace looks rather strange

2013-11-29 05:00:54
Subject: [Bacula-users] Bacula Director Keeps Crashing, Backtrace looks rather strange
From: Rudolph Bott <r AT bott DOT im>
To: <bacula-users AT lists.sourceforge DOT net>
Date: Fri, 29 Nov 2013 10:57:29 +0100
Hi List,

our bacula director (Debian Squeeze, squeeze-backports, Version 5.2.6) 
keeps crashing during the nightly backup runs (>400 jobs). I looked at 
the traceback from last nights crash and it looks rather strange:

Here is a small excerpt from the traceback:

Attempt to dump current JCRs. njcrs=226
threadid=(nil) JobId=0 JobStatus=R jcr=0xc41f38 
name=*JobMonitor*.2013-11-28_17.13.43_01
threadid=(nil) killable=0 JobId=0 JobStatus=R jcr=0xc41f38 
name=*JobMonitor*.2013-11-28_17.13.43_01
         use_count=1
         JobType=I JobLevel=
         sched_time=28-Nov-2013 17:13 start_time=28-Nov-2013 17:13
         end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
         db=(nil) db_batch=(nil) batch_started=0
threadid=(nil) JobId=472723 JobStatus=R jcr=0xd0aca8 
name=Some-Job-XY.2013-11-29_00.10.00_03
threadid=(nil) killable=0 JobId=472723 JobStatus=R jcr=0xd0aca8 
name=Some-Job-XY.2013-11-29_00.10.00_03
         use_count=2
         JobType=B JobLevel=F
         sched_time=29-Nov-2013 00:10 start_time=29-Nov-2013 00:10
         end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
         db=0xc46918 db_batch=(nil) batch_started=0
B_DB=0xc46918 db_name=bacula db_user=bacula connected=true
         cmd="UPDATE Job SET 
JobStatus='R',Level='I',StartTime='2013-11-29 
01:32:42',ClientId=52,JobTDate=1385685162,PoolId=7,FileSetId=54 WHERE 
JobId=472811" changes=0
         RWLOCK=0xc46920 w_active=1 w_wait=4
threadid=(nil) JobId=472724 JobStatus=R jcr=0xd0d188 
name=Some-Job-ZZ.2013-11-29_00.10.00_04
threadid=(nil) killable=0 JobId=472724 JobStatus=R jcr=0xd0d188 
name=Some-Job-ZZ.2013-11-29_00.10.00_04
         use_count=2
         JobType=B JobLevel=F
         sched_time=29-Nov-2013 00:10 start_time=29-Nov-2013 00:10
         end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
         db=0xc46918 db_batch=(nil) batch_started=0
B_DB=0xc46918 db_name=bacula db_user=bacula connected=true
         cmd="UPDATE Job SET 
JobStatus='R',Level='I',StartTime='2013-11-29 
01:32:42',ClientId=52,JobTDate=1385685162,PoolId=7,FileSetId=54 WHERE 
JobId=472811" changes=0
         RWLOCK=0xc46920 w_active=1 w_wait=4


I am not exactly sure what happens here, but this pattern repeats for 
roughly 1000 lines. Strange thing is - it always mentions a different 
Job, but always shows the exact same "UPDATE Job..."-Statement. The job 
that is actually referred to in the WHERE clause is also among the jobs 
mentioned. Do you have any idea what causes this beheaviour? Or is it 
expected? We have been running bacula director in foreground mode with 
daemontools to capture the debug output (level 50, which is already 
quite verbose but does not use too much disk i/o while logging). 
However, the crashes also always seem to happen right after a client 
connect:

2013-11-29 01:32:42.810051500 vlad01.sipgate.net: next_vol.c:253-472933 
Vol=Volume-Name-0002 expired=0
2013-11-29 01:33:03.642079500 vlad01.sipgate.net: job.c:1365-472807 
wstore=FileStorageForPool where=
2013-11-29 01:33:03.659857500 vlad01.sipgate.net: cram-md5.c:79-472807 
send: auth cram-md5 <307015206.1385685183@bacula-host-name> ssl=2
2013-11-29 01:33:03.660066500 vlad01.sipgate.net: cram-md5.c:98-472807 
Authenticate OK SomeSecretHashHere
2013-11-29 01:33:03.702799500 vlad01.sipgate.net: bnet.c:347-472807 TLS 
client negotiation established.
2013-11-29 01:33:03.783616500 Bacula interrupted by signal 11: 
Segmentation violation


Another Example:
2013-11-28 01:48:10.456400500 vlad01.sipgate.net: next_vol.c:253-472391 
Vol=Different-Volume-Name-0001 expired=0
2013-11-28 01:48:15.471433500 vlad01.sipgate.net: job.c:1365-472445 
wstore=FileStorageForPool where=
2013-11-28 01:48:15.485142500 vlad01.sipgate.net: cram-md5.c:79-472445 
send: auth cram-md5 <1436181388.1385599695@bacula-host-name> ssl=2
2013-11-28 01:48:15.485182500 vlad01.sipgate.net: cram-md5.c:98-472445 
Authenticate OK SomeSecretHashHere
2013-11-28 01:48:15.527138500 vlad01.sipgate.net: bnet.c:347-472445 TLS 
client negotiation established.
2013-11-28 01:48:15.607948500 Bacula interrupted by signal 11: 
Segmentation violation

Any hints on what could cause this behaviour? We are really clueless at 
this point... :-/

-- 
Mit freundlichen Grüßen / with kind regards
   Rudolph Bott

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Bacula Director Keeps Crashing, Backtrace looks rather strange, Rudolph Bott <=