Hi List,
our bacula director (Debian Squeeze, squeeze-backports, Version 5.2.6)
keeps crashing during the nightly backup runs (>400 jobs). I looked at
the traceback from last nights crash and it looks rather strange:
Here is a small excerpt from the traceback:
Attempt to dump current JCRs. njcrs=226
threadid=(nil) JobId=0 JobStatus=R jcr=0xc41f38
name=*JobMonitor*.2013-11-28_17.13.43_01
threadid=(nil) killable=0 JobId=0 JobStatus=R jcr=0xc41f38
name=*JobMonitor*.2013-11-28_17.13.43_01
use_count=1
JobType=I JobLevel=
sched_time=28-Nov-2013 17:13 start_time=28-Nov-2013 17:13
end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
db=(nil) db_batch=(nil) batch_started=0
threadid=(nil) JobId=472723 JobStatus=R jcr=0xd0aca8
name=Some-Job-XY.2013-11-29_00.10.00_03
threadid=(nil) killable=0 JobId=472723 JobStatus=R jcr=0xd0aca8
name=Some-Job-XY.2013-11-29_00.10.00_03
use_count=2
JobType=B JobLevel=F
sched_time=29-Nov-2013 00:10 start_time=29-Nov-2013 00:10
end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
db=0xc46918 db_batch=(nil) batch_started=0
B_DB=0xc46918 db_name=bacula db_user=bacula connected=true
cmd="UPDATE Job SET
JobStatus='R',Level='I',StartTime='2013-11-29
01:32:42',ClientId=52,JobTDate=1385685162,PoolId=7,FileSetId=54 WHERE
JobId=472811" changes=0
RWLOCK=0xc46920 w_active=1 w_wait=4
threadid=(nil) JobId=472724 JobStatus=R jcr=0xd0d188
name=Some-Job-ZZ.2013-11-29_00.10.00_04
threadid=(nil) killable=0 JobId=472724 JobStatus=R jcr=0xd0d188
name=Some-Job-ZZ.2013-11-29_00.10.00_04
use_count=2
JobType=B JobLevel=F
sched_time=29-Nov-2013 00:10 start_time=29-Nov-2013 00:10
end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00
db=0xc46918 db_batch=(nil) batch_started=0
B_DB=0xc46918 db_name=bacula db_user=bacula connected=true
cmd="UPDATE Job SET
JobStatus='R',Level='I',StartTime='2013-11-29
01:32:42',ClientId=52,JobTDate=1385685162,PoolId=7,FileSetId=54 WHERE
JobId=472811" changes=0
RWLOCK=0xc46920 w_active=1 w_wait=4
I am not exactly sure what happens here, but this pattern repeats for
roughly 1000 lines. Strange thing is - it always mentions a different
Job, but always shows the exact same "UPDATE Job..."-Statement. The job
that is actually referred to in the WHERE clause is also among the jobs
mentioned. Do you have any idea what causes this beheaviour? Or is it
expected? We have been running bacula director in foreground mode with
daemontools to capture the debug output (level 50, which is already
quite verbose but does not use too much disk i/o while logging).
However, the crashes also always seem to happen right after a client
connect:
2013-11-29 01:32:42.810051500 vlad01.sipgate.net: next_vol.c:253-472933
Vol=Volume-Name-0002 expired=0
2013-11-29 01:33:03.642079500 vlad01.sipgate.net: job.c:1365-472807
wstore=FileStorageForPool where=
2013-11-29 01:33:03.659857500 vlad01.sipgate.net: cram-md5.c:79-472807
send: auth cram-md5 <307015206.1385685183@bacula-host-name> ssl=2
2013-11-29 01:33:03.660066500 vlad01.sipgate.net: cram-md5.c:98-472807
Authenticate OK SomeSecretHashHere
2013-11-29 01:33:03.702799500 vlad01.sipgate.net: bnet.c:347-472807 TLS
client negotiation established.
2013-11-29 01:33:03.783616500 Bacula interrupted by signal 11:
Segmentation violation
Another Example:
2013-11-28 01:48:10.456400500 vlad01.sipgate.net: next_vol.c:253-472391
Vol=Different-Volume-Name-0001 expired=0
2013-11-28 01:48:15.471433500 vlad01.sipgate.net: job.c:1365-472445
wstore=FileStorageForPool where=
2013-11-28 01:48:15.485142500 vlad01.sipgate.net: cram-md5.c:79-472445
send: auth cram-md5 <1436181388.1385599695@bacula-host-name> ssl=2
2013-11-28 01:48:15.485182500 vlad01.sipgate.net: cram-md5.c:98-472445
Authenticate OK SomeSecretHashHere
2013-11-28 01:48:15.527138500 vlad01.sipgate.net: bnet.c:347-472445 TLS
client negotiation established.
2013-11-28 01:48:15.607948500 Bacula interrupted by signal 11:
Segmentation violation
Any hints on what could cause this behaviour? We are really clueless at
this point... :-/
--
Mit freundlichen Grüßen / with kind regards
Rudolph Bott
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|