Bacula-users

Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??

2010-04-15 15:42:55
Subject: Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??
From: Kern Sibbald <kern AT sibbald DOT com>
To: bacula-devel AT lists.sourceforge DOT net
Date: Thu, 15 Apr 2010 21:40:36 +0200
On Thursday 15 April 2010 19:36:51 Stephen Thompson wrote:
> Additionally, seems like the SD was possibly reading a new
> freshly-labeled tape when it crashed...  Last items in bacula log
> besides alerts already mentioned:

In Bacula "alerts" refer to tape drive information stored concerning tape 
problems, so I am assuming you mean messages.

>
>
> 15-Apr 09:31 server-sd JobId 100000: Writing spooled data to Volume.
> Despooling 35,000,185,219 bytes ...
> 15-Apr 09:51 server-sd JobId 100000: End of Volume "FB0568" at 888:1414
> on device "SL500-Drive-1" (/dev/nst0). Write of 262144 bytes got -1.
> 15-Apr 09:51 server-sd JobId 100000: Re-read of last block succeeded.
> 15-Apr 09:51 server-sd JobId 100000: End of medium on Volume "FB0568"
> Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51.
> 15-Apr 09:51 server-sd JobId 100000: 3307 Issuing autochanger "unload
> slot 38, drive 1" command.
> 15-Apr 09:52 server-sd JobId 100000: 3301 Issuing autochanger "loaded?
> drive 1" command.
> 15-Apr 09:52 server-sd JobId 100000: 3302 Autochanger "loaded? drive 1",
> result: nothing loaded.
> 15-Apr 09:52 server-sd JobId 100000: 3304 Issuing autochanger "load slot
> 39, drive 1" command.
> 15-Apr 09:52 server-sd JobId 100000: 3305 Autochanger "load slot 39,
> drive 1", status is OK.
> 15-Apr 09:52 server-sd JobId 100000: Volume "FB0569" previously written,
> moving to end of data.
>
> Nothing but thousands of 'repetitive' alerts after that...

What exactly is repeated?

There was a Bacula bug #1480 in message delivery that may be the same that you 
are experiencing, it was triggered by a misconfigured SMTP server or by a 
reference in Bacula to a non-existent SMTP server  - and the simple solution 
is to make sure Bacula points to a valid functional SMTP server.  This 
problem was not particular to version 5.0.1, but I think it was fixed after 
the release of 5.0.1.  Please see the bugs database for more details.

Kern

>
> thanks again,
> Stephen
>
> On 04/15/2010 10:25 AM, Stephen Thompson wrote:
> > Hello,
> >
> > I have just now experienced a possible new bug with bacula 5.0.1.
> >
> > The symptoms are this:
> >
> > bacula-sd crashes
> > bacula-dir continues to run
> > bacula-dir then spews out identical "Intervention needed" emails until
> > manually restarted
> >
> > The first time this happened over a weekend and upon returning I found
> > my inbox has about 120,000 bacula emails, all the SAME and of this type:
> >
> > "15-Apr 10:02 client-fd JobId 100001: Fatal error: backup.c:1048 Network
> > send error to SD. ERR=Broken pipe"
> >
> > It happened again just now (second time since upgrading from 3.0.3 to
> > 5.0.1) and I managed to stop the director with only a few thousand
> > emails going out.
> >
> > So there are really 2 issues here:
> >
> > 1)
> > Why does the director apparently get stuck in an infinite loop of
> > sending the same email message?  Is this a known bug?
> >
> > 2)
> > Regarding the SD, I received one alert of this type, the rest like the
> > above:
> >
> >    "15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
> > dev->blocked()"
> >
> > A traceback like:
> > --
> > ptrace: Operation not permitted.
> > /var/bacula/work/29091: No such file or directory.
> > $1 = 0
> > /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command
> > file: No symbol "exename" in current context.
> > --
> >
> > And a bactrace like:
> > --
> > Attempt to dump current JCRs
> > JCR=0x19a24888 JobId=100000 name=client_1.2010-04-14_18.02.33_41
> > JobStatus=l use_count=1
> >           JobType=B JobLevel=F
> >           sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
> >           end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
> >           db=(nil) db_batch=(nil) batch_started=0
> > JCR=0x1981b248 JobId=100001 name=client_10.2010-04-14_20.00.15_04
> > JobStatus=R
> >           use_count=1
> >           JobType=B JobLevel=I
> >           sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
> >           end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
> >           db=(nil) db_batch=(nil) batch_started=0
> > Attempt to dump plugins. Hook count=0
> > --
> >
> > Both clients and server seem healthy, except for the SD crash.
> > Any ideas?
> >
> >
> > thanks!
> > Stephen
> >
> >
> > -------------------------------------------------------------------------
> >------------ Further info:
> >
> > My catalog...
> >
> >       mysql-5.0.77 (64bit) MyISAM
> >       210Gb in size
> >       1,412,297,215 records in File table
> >       note: database built with bacula 2x scripts,
> >       upgraded with 3x scripts, then again with 5x scripts
> >       (i.e. nothing customized along the way)
> >
> > My OS&  hardware for bacula DIR+SD server...
> >
> >       Centos 5.4 (fully patched)
> >       8Gb RAM
> >       2Gb Swap
> >       1Tb EXT3 filesystem on external fiber RAID5 array
> >       (dedicated to database, incl. temp files)
> >       2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
> >       StorageTek SL500 Library with 2 LTO3 Drives
> >
> >
> >
> >
> >
> > -------------------------------------------------------------------------
> >----- Download Intel&#174; Parallel Studio Eval
> > Try the new software tools for yourself. Speed compiling, find bugs
> > proactively, and fine-tune applications for parallel performance.
> > See why Intel Parallel Studio got high marks during beta.
> > http://p.sf.net/sfu/intel-sw-dev
> > _______________________________________________
> > Bacula-devel mailing list
> > Bacula-devel AT lists.sourceforge DOT net
> > https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>