Bacula-users

Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??

2010-04-15 15:26:18
Subject: Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??
From: Kern Sibbald <kern AT sibbald DOT com>
To: bacula-devel AT lists.sourceforge DOT net
Date: Thu, 15 Apr 2010 21:24:18 +0200
On Thursday 15 April 2010 19:25:34 Stephen Thompson wrote:
> Hello,
>
> I have just now experienced a possible new bug with bacula 5.0.1.
>
> The symptoms are this:
>
> bacula-sd crashes
> bacula-dir continues to run
> bacula-dir then spews out identical "Intervention needed" emails until
> manually restarted
>
> The first time this happened over a weekend and upon returning I found
> my inbox has about 120,000 bacula emails, all the SAME and of this type:
>
> "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network
> send error to SD. ERR=Broken pipe"
>
> It happened again just now (second time since upgrading from 3.0.3 to
> 5.0.1) and I managed to stop the director with only a few thousand
> emails going out.
>
> So there are really 2 issues here:
>
> 1)
> Why does the director apparently get stuck in an infinite loop of
> sending the same email message? 

I have no idea.

> Is this a known bug? 

No, I have never heard or seen this kind of problem before.

>
> 2)
> Regarding the SD, I received one alert of this type, 

What is an "alert"?  Do you mean an email or a Bacula message?  If so, which 
one?


> the rest like the above:
>
>   "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
> dev->blocked()"

It is not very clear what you are saying.  Do you mean that you are receiving 
the above message many times?

>
> A traceback like:
> --
> ptrace: Operation not permitted.
> /var/bacula/work/29091: No such file or directory.
> $1 = 0
> /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file:
> No symbol "exename" in current context.
> --
>
> And a bactrace like:
> --
> Attempt to dump current JCRs
> JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41
> JobStatus=l use_count=1
>          JobType=B JobLevel=F
>          sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
>          end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
>          db=(nil) db_batch=(nil) batch_started=0
> JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04
> JobStatus=R
>          use_count=1
>          JobType=B JobLevel=I
>          sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
>          end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
>          db=(nil) db_batch=(nil) batch_started=0
> Attempt to dump plugins. Hook count=0
> --
>
> Both clients and server seem healthy, except for the SD crash.
> Any ideas?

No.  To understand the problem we will need a traceback that you will probably 
need to produce manually as described in the Kaboom chapter, or you will need 
to fix the automatic traceback scripts so that they can do a ptrace.

Kern

>
>
> thanks!
> Stephen
>
>
> ---------------------------------------------------------------------------
>---------- Further info:
>
> My catalog...
>
>      mysql-5.0.77 (64bit) MyISAM
>      210Gb in size
>      1,412,297,215 records in File table
>      note: database built with bacula 2x scripts,
>      upgraded with 3x scripts, then again with 5x scripts
>      (i.e. nothing customized along the way)
>
> My OS & hardware for bacula DIR+SD server...
>
>      Centos 5.4 (fully patched)
>      8Gb RAM
>      2Gb Swap
>      1Tb EXT3 filesystem on external fiber RAID5 array
>      (dedicated to database, incl. temp files)
>      2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
>      StorageTek SL500 Library with 2 LTO3 Drives
>
>
>
>
>
> ---------------------------------------------------------------------------
>--- Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Bacula-devel mailing list
> Bacula-devel AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users