Bacula-users

Re: [Bacula-users] Need help debugging SD crash

2010-04-08 11:13:44
Subject: Re: [Bacula-users] Need help debugging SD crash
From: Robert LeBlanc <robert AT leblancnet DOT us>
To: Matija Nalis <mnalis+bacula AT carnet DOT hr>
Date: Thu, 8 Apr 2010 09:10:55 -0600
On Thu, Apr 8, 2010 at 9:02 AM, Matija Nalis <mnalis+bacula AT carnet DOT hr> wrote:
On Tue, Apr 06, 2010 at 08:40:20AM -0600, Robert LeBlanc wrote:
> I've tried in the past to do exactly this. Bacula will usually spit out an
> error that the tape could not be moved or in rarer situations say the drive
> is not there. I then shut down bacula-sd and try to run the mt eject command
> I I usually get back about ten lines that describe the error, but it does
> really make sense. Sometimes the drive doesn't appear as a device on the
> system any more. As far as the tape library, the Overland Neo 8000 most of

[...]

> the time says soft removal error on the screen and will keep saying that if
> I try to have the library remove it. There is no easy way to get to the
> hardware eject button as the library is fully enclosed.

It looks like the drive gets confused if it gets commands too fast
(and/or while it is still processing previous commands)... Anyway, it
looks like problem outside bacula (probably either the kernel, or a
drive firmware, or both are at error).

> drives and our LTO-4 drive. The only thing that I can think of is that
> bacula is trying to take some shortcuts (issuing a command to move the tape
> and expecting the tape library to correctly rewind the tape, eject and then
> move it and maybe bacula is not quiet letting go of the drive fast enough
> and there gets a deadlock between the drive controlled by Bacula and the
> library trying to control it), or there is a kernel/driver problem.

Only thing bacula does is execute mtx-changer script; it is the
scripts responsibility to does everything needed for your drive /
changer combination. The default script is usually good, but you may
need to tailor it for your needs (if it needs manual rewind before
offline, or things like that).

> I've set the offline=1 in mtx-changer.conf and that seems to help a little,
> I've still encountered some drive unmouting issues, but nothing that bacula
> hasn't been able to recover from on it's own or with very little manual
> intervention.

I run mine (IBM3584) with:
offline=1
offline_sleep=2
load_sleep=20

I do recall having sporadic issues with load_sleep of just a 2-3
seconds, so I've put it to 20 to allow the drive to settle fully
before issuing a bunch of "mt status" to it in wait_for_drive().

> I was pretty sure the messages were informational, I'm glad that someone can
> confirm that. I'll keep working on the problem to see what I can come up
> with. If there is a better way to tell Bacula to be stupid slow with unmount
> and mount requests, that may help me find where in the process things are
> getting hung up.

Well, you can put (in 5.0.1 at least) offline_sleep and load_sleep to
30 seconds or more for example, that might help if drive is getting
confused while receiving commands too fast.

On older versions (3.0.x or 2.4 ?) you can edit the mtx-changer shell
script itself, IIRC it had commented out "sleep" statements at right
places already...

Thanks, this is helpful, I'll give these a try.

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University
------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>