Bacula-users

Re: [Bacula-users] question reguarding autochanger use

2008-05-07 04:09:02
Subject: Re: [Bacula-users] question reguarding autochanger use
From: Arno Lehmann <al AT its-lehmann DOT de>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 07 May 2008 10:08:30 +0200
Hi,

07.05.2008 06:28, Blake Dunlap wrote:
> 
> 
> I was thinking of modifying mtx-changer to automatically do an update 
> slots and retry whatever action was occurring if the following error 
> occurred,

(which is a "Slot full" condition on unload)

> but then I realized that wouldn’t work due to not being able 
> to do that during an action to begin with.

Right... at least for this case that sounds like a safe way to a deadlock.

> Anyone have any suggestions 
> on how to automatically handle the following situation without simply 
> failing to the “Intervention needed” blackhole until someone can go 
> babysit the box and fix the simple error?

You could retry the unload to any other slot - mtx unload without a 
target slot might work.

The key issue here is that, in my experience, these problems more or 
less require manual intervention because they are most likely caused 
by someone or some other process interfering in autochanger 
operations, and that is not something Bacula is designed to handle.

As a minimal option to fix things automatically, if mtx-changer fails 
with any error, you could try to start a background script that does 
the following:

- wait a few minutes, to give Bacula time to handle the error internally.
- check if jobs are running (using bconsole and 'sta sd' and parsing 
the output.
- if jobs are running on the affected storage device, assume 
everything is fine again and do nothing.
- if jobs are not running and the device is blocked, issue an 'mtx 
inventory' and, for bacula, an 'update slots' and 'mount sd'.
- check again if jobs are running now.
- if the device is still blocked, complain very loud to the admin.

Again, though, my experience is that in most cases of autochanger 
operations going wrong you really want a human to intervene, so I'm 
perfectly happy with Bacula's request for intervention.

Arno

>  
> 
>  
> 
>  
> 
> 06-May 23:00 nrepbak01-sd JobId 11541: 3307 Issuing autochanger "unload 
> slot 13, drive 0" command.
> 
> 06-May 23:03 nrepbak01-sd JobId 11541: 3995 Bad autochanger "unload slot 
> 13, drive 0": ERR=Child exited with code 1
> 
> Results=/dev/nst0: Input/output error
> 
> Storage Element 13 is Already Full
> 
>  
> 
> 06-May 23:03 nrepbak01-sd JobId 11541: Please mount Volume "LTO004" or 
> label a new one for:
> 
>     Job:          nrepbak01.2008-05-06_23.00.48
> 
>     Storage:      "DriveA" (/dev/nst0)
> 
>     Pool:         OnsiteIncremental
> 
>     Media type:   LTO2
> 
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users