Bacula-users

[Bacula-users] 5.0.1 infinite email loop bug??

2010-04-15 13:28:54
Subject: [Bacula-users] 5.0.1 infinite email loop bug??
From: Stephen Thompson <stephen AT seismo.berkeley DOT edu>
To: bacula-devel AT lists.sourceforge DOT net, bacula-users AT lists.sourceforge DOT net
Date: Thu, 15 Apr 2010 10:25:34 -0700
Hello,

I have just now experienced a possible new bug with bacula 5.0.1.

The symptoms are this:

bacula-sd crashes
bacula-dir continues to run
bacula-dir then spews out identical "Intervention needed" emails until 
manually restarted

The first time this happened over a weekend and upon returning I found 
my inbox has about 120,000 bacula emails, all the SAME and of this type:

"15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network 
send error to SD. ERR=Broken pipe"

It happened again just now (second time since upgrading from 3.0.3 to 
5.0.1) and I managed to stop the director with only a few thousand 
emails going out.

So there are really 2 issues here:

1)
Why does the director apparently get stuck in an infinite loop of 
sending the same email message?  Is this a known bug?

2)
Regarding the SD, I received one alert of this type, the rest like the 
above:

  "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: 
dev->blocked()"

A traceback like:
--
ptrace: Operation not permitted.
/var/bacula/work/29091: No such file or directory.
$1 = 0
/opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file:
No symbol "exename" in current context.
--

And a bactrace like:
--
Attempt to dump current JCRs
JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41 JobStatus=l
         use_count=1
         JobType=B JobLevel=F
         sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
         end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
         db=(nil) db_batch=(nil) batch_started=0
JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04 
JobStatus=R
         use_count=1
         JobType=B JobLevel=I
         sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
         end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
         db=(nil) db_batch=(nil) batch_started=0
Attempt to dump plugins. Hook count=0
--

Both clients and server seem healthy, except for the SD crash.
Any ideas?


thanks!
Stephen


-------------------------------------------------------------------------------------
Further info:

My catalog...

     mysql-5.0.77 (64bit) MyISAM
     210Gb in size
     1,412,297,215 records in File table
     note: database built with bacula 2x scripts,
     upgraded with 3x scripts, then again with 5x scripts
     (i.e. nothing customized along the way)

My OS & hardware for bacula DIR+SD server...

     Centos 5.4 (fully patched)
     8Gb RAM
     2Gb Swap
     1Tb EXT3 filesystem on external fiber RAID5 array
     (dedicated to database, incl. temp files)
     2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
     StorageTek SL500 Library with 2 LTO3 Drives





------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>