Bacula-users

Re: [Bacula-users] bacula-sd crashing

2008-05-07 04:23:18
Subject: Re: [Bacula-users] bacula-sd crashing
From: Arno Lehmann <al AT its-lehmann DOT de>
To: Bacula Users <Bacula-users AT lists.sourceforge DOT net>
Date: Wed, 07 May 2008 10:22:42 +0200
Hi,

04.05.2008 14:18, Erik Persson wrote:
> On 4 maj 2008, at 04.40, John Drescher wrote:
> 
>> On Sat, May 3, 2008 at 10:31 AM, Erik Persson <erik AT lysator.liu DOT se>  
>> wrote:
>>> Hello!
>>>
>>> This would be my first post to this list.  We have deployed three
>>> bacula installations at the company I work for and it has been  
>>> working
>>> fairly well except for one thing:
>>>
>>> Whenever we are running a backup job that causes the tape library to
>>> run out of tapes we usually get sd crashes while labelling (or
>>> possibly even just inventorying) new tapes.  Sequence as follows:
>>>
>>> * Bacula wants to mount a tape for pool foo
>>> * unmount is performed
>>> * Magazine is ejected
>>> * New unlabeled tapes are loaded
>>> * Labeling is requested
>>> * bacula-sd dies

That would be a bug.

>>>
>>> This happens pretty much on every attempt.  I am unsure wether it  
>>> also
>>> happens if just an update slots on appendable/purged media is  
>>> performed.
>>>
>>> We have not seen any crashes while labeling or scanning if bacula is
>>> otherwise idle so defining the jobs so that they are guaranteed not  
>>> to
>>> run out of tapes does kind of take care of the problem.
>>>
>>> A sysadmin friend who has been dealing with mtx (with bacula and  
>>> other
>>> backup software) told me that it may be a bit lacking when it comes  
>>> to
>>> error handling

I kind of agree... mtx simply returns some sort of dump of the SCSI 
error data, which is definitely not easily human readable.

>>> and his theory was that the sd may get confused if mtx
>>> returns something bad.

It shouldn't... usually, if the mtx process (or mtx-changer) returns 
something unexpected, the SD considers this a problem, dumps the whole 
stuff to the defined logging places (log file, console, and mail 
usually) and asks for intervention.

>>> One common denominator for these systems is that they are using
>>> Overland 20-slot libraries and we have noticed that if an mtx command
>>> is issued while a library operation already is in progress we
>>> typically get a SCSI error in return.  Could this have something to  
>>> do
>>> with it?

It could, though I guess it would be a bit deeper in the code than 
simply the SD stumbling over malformed output.

>>> Any hints would be greatly appreciated.
>>>
>>>
>> Do you wait 5 minutes after inserting the magazine before trying to
>> issue bacula commands to allow the archive inventory to finish? Did
>> you do an update slots before you tried to label the tapes?
>>
>> John
> 
> It's a bit of a walk back to the office but I cannot guarantee that  
> the library might not have still been inventorying itself.  I'll do an  
> mtx status outside of bacula next time to make sure it's ready.
> 
> No, I don't usually do an update slots before labeling.  I'll try that  
> too and see what happens.
> 
> But still:  Is it not a bit odd that the sd just dies if it runs into  
> a transient error or inconsistency?

Definitely. I'd try capturing debug trace output from the SD and 
trigger the error condition.

Also, you should get a backtrace of the SD if you've got the SD 
running un-stripped, and gdb is available.

With a backtrace plus debug output, this would be worth a bug report.

Arno


> Best regards,
> 
> /Erik
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>