Bacula-users

Re: [Bacula-users] question reguarding autochanger use

2008-05-07 10:36:42
Subject: Re: [Bacula-users] question reguarding autochanger use
From: Blake Dunlap <blake AT ISDN DOT NET>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 7 May 2008 09:31:28 -0500
(Disclaimer: I am not on my normal computer so this top post is not in bad 
faith, blame Outlook)

Arno,

        Thank you for the help as usual. I understand what the problem is (well 
symptom), and agree with the risks for deadlock if there is anything else going 
on. In theory this should never occur, but I am seeing it weekly. All 
technicians that interact with the changer state they are correctly following 
procedure and asking Bacula to update slots when finished, and confirming that 
data is returned. I am not sure if it is a problem with the script possibly 
missing tape locations sometimes, perhaps my autochanger is being defiant, 
something is directly accessing the autochanger around Bacula, or more simply 
my technicians are not properly following procedure. At this point I am trying 
to countermand this specific error programmatically as when this occurs, I can 
be fairly confident about the tape changer being stuck in any case.

The script method does seem viable, though I would rather the fix be inline 
(due to the way alerts work etc.). Ok for my next thought, do you see any 
problems with modifying mtx-changer to send a custom error return code on this 
event, and then modifying the SD code to re-poll the autochanger slots on 
receipt of this error, and then afterwards retry the original unmount using 
current slot data?

-Blake

-----Original Message-----
From: bacula-users-bounces AT lists.sourceforge DOT net 
[mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of Arno 
Lehmann
Sent: Wednesday, May 07, 2008 3:09 AM
To: bacula-users AT lists.sourceforge DOT net
Subject: Re: [Bacula-users] question reguarding autochanger use

Hi,

07.05.2008 06:28, Blake Dunlap wrote:
>
>
> I was thinking of modifying mtx-changer to automatically do an update
> slots and retry whatever action was occurring if the following error
> occurred,

(which is a "Slot full" condition on unload)

> but then I realized that wouldn't work due to not being able
> to do that during an action to begin with.

Right... at least for this case that sounds like a safe way to a deadlock.

> Anyone have any suggestions
> on how to automatically handle the following situation without simply
> failing to the "Intervention needed" blackhole until someone can go
> babysit the box and fix the simple error?

You could retry the unload to any other slot - mtx unload without a
target slot might work.

The key issue here is that, in my experience, these problems more or
less require manual intervention because they are most likely caused
by someone or some other process interfering in autochanger
operations, and that is not something Bacula is designed to handle.

As a minimal option to fix things automatically, if mtx-changer fails
with any error, you could try to start a background script that does
the following:

- wait a few minutes, to give Bacula time to handle the error internally.
- check if jobs are running (using bconsole and 'sta sd' and parsing
the output.
- if jobs are running on the affected storage device, assume
everything is fine again and do nothing.
- if jobs are not running and the device is blocked, issue an 'mtx
inventory' and, for bacula, an 'update slots' and 'mount sd'.
- check again if jobs are running now.
- if the device is still blocked, complain very loud to the admin.

Again, though, my experience is that in most cases of autochanger
operations going wrong you really want a human to intervene, so I'm
perfectly happy with Bacula's request for intervention.

Arno

>
>
>
>
>
>
> 06-May 23:00 nrepbak01-sd JobId 11541: 3307 Issuing autochanger "unload
> slot 13, drive 0" command.
>
> 06-May 23:03 nrepbak01-sd JobId 11541: 3995 Bad autochanger "unload slot
> 13, drive 0": ERR=Child exited with code 1
>
> Results=/dev/nst0: Input/output error
>
> Storage Element 13 is Already Full
>
>
>
> 06-May 23:03 nrepbak01-sd JobId 11541: Please mount Volume "LTO004" or
> label a new one for:
>
>     Job:          nrepbak01.2008-05-06_23.00.48
>
>     Storage:      "DriveA" (/dev/nst0)
>
>     Pool:         OnsiteIncremental
>
>     Media type:   LTO2
>
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users

--
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users