Bacula-users

Re: [Bacula-users] [Bacula-devel] Minor SD Feature Request

2008-05-11 11:30:09
Subject: Re: [Bacula-users] [Bacula-devel] Minor SD Feature Request
From: Kern Sibbald <kern AT sibbald DOT com>
To: bacula-devel AT lists.sourceforge DOT net
Date: Sun, 11 May 2008 17:30:28 +0200
On Sunday 11 May 2008 12:45:28 Arno Lehmann wrote:
> Hi,
>
> 11.05.2008 11:37, Kern Sibbald wrote:
> > Hello,
> >
> > OK, thanks. You have confirmed what I suspected.  In effect, this is
> > really a support problem - I suspect you not fully understanding how
> > Bacula works and its limitations (explained below).
>
> Well, this is something that we discussed on the -users list, and as
> far as I can tell, Blake pretty well understands the way Bacula works
> and has implemented procedures to do things right, 

Well, I am happy to hear that.

> but the autochanger 
> itself is causing the trouble. (By loading imported volumes to
> whatever slots are available automatically).

Well, if one follows the full procedure I outlined, the problems mentioned 
would not happen unless I am missing some other problem.  

>
> > First, without some additional design and coding, it is not possible for
> > Bacula to snoop around on the autochanger for an available slot in which
> > to unload a volume.
>
> Well, with the discussed slots status query by the SD a good part of
> the design would already exist.

The SD can query slots, but it cannot know what slot to use.  That information 
comes only from the Director and the Director gets that information from what 
the user (via bconsole commands) from info he put into the catalog.  Without 
the appropriate entries in the catalog (or perhaps some express user 
command), the SD by itself will not directly access any slots.  This permits 
sharing of autochangers without having the autochanger physically 
partitioned, which is most often not possible.  It is a fundamental design 
concept of the current Bacula code, and as I mentioned below, it can be 
changed, but would require a design, a feature request, and scheduling 
implementation.  It probably involves new directives and new communications 
between the DIR and the SD.

>
> >  The autochanger may have hundreds of slots, with only a few
> > available for Bacula, and currently there is no way to tell Bacula that
> > it "owns" slots n-m  (this could be a future enhancement).  As a
> > consequence, with the current design, Bacula must always unload a volume
> > into the slot from which it came.
>
> I disagree... note that I'm not talking about shared autochangers
> (which would best be shared by partitioning a big library, i.e.
> relying on the library hardware to keep track of which slots belong to
> which logical autochanger). 

> So I assume Bacula has the autochanger it sees all for itself.

That is not an assumption that Bacula makes, and if we changed it to make that 
assumption, without the project mentioned above, my guess is that it would 
totally break things in some large shops.

Someone used to big shops like David Boyes might be able to give a bit more 
insight here.

>
> In this case it would be possible to list the slots, look for unused
> ones, and unload the current tape to one of these, updating that
> catalog accordingly (the mtx-changer script also supports this, by the
> way).

That is possible, and I mentioned it, but it falls into a new design needing 
new code, and hence is not something that can be simply "patched" in -- i.e. 
it is not a bug fix.

>
> > Second, from the above you should have gathered that if you manually load
> > a volume into a slot where Bacula has loaded a volume from that slot into
> > a drive, at some point everything is going to fail as you are seeing.
>
> Yup, though the term "manually" is misleading in this scenario...

I meant that it was something done by a human intervention rather than by 
Bacula.  In some autochangers like mine, new volumes are introduced manually 
directly into the slot.  With bigger autochangers, there are mail slots and 
such through which the operator can manually enter new volumes that are then 
loaded into appropriate slots by the autochanger. There are probably even 
other schemes ...

>
> > When you change something in the autochanger the preferred way of doing
> > so is:
> >
> > -- first unmount all drives that Bacula has mounted
> > -- change the autochanger volumes
> > -- do an update slots
> > -- finally remount the drives with Bacula.
>
> I think Blake knows that.

I believe that if he were doing at least the first step his major complaint in 
the Feature request would be completely resolved.  This would not resolve the 
additional problems he is seeing if the procedure for doing the update slots 
fails.

>
> > It is possible to rearrange the volumes in the autochanger without
> > unloading all the drives providing that Bacula doesn't want to
> > load/unload any volume while you are changing things in in the
> > autochanger.  I strongly recommend against doing this, but it is possible
> > in a situation where Bacula is running a job.  Doing so is not without
> > risks though.
> >
> > If you don't follow these simple rules, Bacula will sooner or later fail,
> > and probably the worst case is if you load a volume into a slot where
> > there is a volume in one of the drives.
> >
> > I do believe that we could improve how Bacula handles Volumes found in
> > Slots where they are not expected, and I will look at that, but for the
> > moment, having Bacula unload a volume into a different slot than from
> > where it came is a much bigger project that if well designed and accepted
> > would be a feature after the next major release (3.0.0).
>
> Well, I won't argue here, but I believe the design work needed is not
> that complex.

I agree with you it is not that complex. Clarification: "complex" is not a 
word I used or meant to imply.  

Best regards,

Kern

>
> > Summary:
> > - I cannot accept your Feature Request as formulated without additional
> > design work so that it won't break shared autochangers.
> >
> > - You can resolve your problems by implementing improved sysadmin
> > procedures.
>
> Perhaps... ok, attached is a starting point. This is a script I use to
> help managing autoloaders, especially unloading full volumes and
> loading new ones.
>
> I recommend that you very carefully test it - it's more a hack that
> grew into a rather large program (at least for my coding skills...)
> and I'm quite sure it can be improved a lot.
>
> If I had the time I know that I could rework much of it to become more
> generally useabl and better structured.
>
> This script is know to work in production environments, but still - no
> warranties, you are all on your own, and so on.
>
> Arno
>
> > Regards,
> >
> > Kern
> >
> > PS: When unmounting, you do specify an Autochanger, but since
> > autochangers may have multiple drives, you must specify which drive of
> > the autochanger.  If you have only one drive, entering a return at the
> > question is all that is necessary to do the right thing.
> >
> > On Saturday 10 May 2008 19:22:47 Blake Dunlap wrote:
> >>> Hello Blake,
> >>>
> >>> One part of Bacula that I would like to improve just a bit (not too
> >>> much coding for the moment) for the next release is the information
> >>> returned for
> >>> Autochangers.  Currently, it seems to me that the sysadmin has very
> >>> little information about the actual state of the autochanger via the
> >>> console interface.  Although your suggestion seems to be a bit more
> >>> than simple reporting of the status, I am interested in it.  The
> >>> problem is that I don't
> >>> understand what you are asking for well enough to possibly implement
> >>> something.
> >>>
> >>> Could you be much more explicit with what you want, perhaps giving an
> >>> explicit
> >>> example of what happens now and what you would like to see happen. 
> >>> Don't forget that at the current time, Bacula has no concept of
> >>> changing the slot -- for example, when a Volume is loaded by Bacula
> >>> from Slot 2 into the
> >>> drive, it *must* be returned to the same Slot.  Changing this behavior
> >>> is a
> >>> project that would require significant design and thought and is
> >>> probably not
> >>> something we would want to implement in the near future.
> >>>
> >>> On the other hand, I think there is a lot of need and possibility for
> >>> making
> >>> Bacula much smarter at automatically recognizing that a Volume is in a
> >>> different Slot from what is written in the database.  Currently such
> >>> volumes
> >>> are marked in error (if I remember right), but we could consider simply
> >>> correcting the info in the database.
> >>>
> >>> Best regards,
> >>>
> >>> Kern
> >>
> >> It is the last paragraph that I am mostly looking at dealing with. Let
> >> me give our situation in depth and I think that will explain what I am
> >> looking for.
> >>
> >> We have a 2 drive auto-changer and run 4 pools of backups (Incremental,
> >> OnSiteFull, OffsiteFull, and OnsiteMonthly). We run two sets of backups
> >> for clients, an offsite backup that runs every Friday night (due to the
> >> lack of copy pools etc), and the OnSite backups which occur every night
> >> incremental, except Saturday night which is a full (the pool is
> >> overridden to Monthly the first sat of a month). Anyway we rotate the
> >> Offsite tapes every Tuesday, and supposedly there is an update slots run
> >> with all drives released at the conclusion of the procedure which should
> >> update the database as to the current state of the auto-changer.
> >>
> >> Now that the back story is established, what has been extremely
> >> frustrating is that a decent percentage of the time, something occurs
> >> which places the tapes out of sync, and come Saturday night (the first
> >> night a drive would have to swap) the auto-changer fails to load a new
> >> tape it is looking for in the OnsiteFull pool, due to the tape that was
> >> in the drive failing to unload due to a slot full condition. Bacula now
> >> requests user intervention loading the tape, and the drive is marked
> >> unloaded (because the error didn't occur during an unload event, but a
> >> load event, which makes it a pain to determine what tape is actually
> >> loaded in the drive currently). To fix this, one must run an update
> >> slots, then look back in the logs to figure out what tape failed to
> >> unload, then "load" that tape into the drive, and Bacula will then
> >> realize the drive is usable again, and then proceed as normal. Of course
> >> due to the times we run backups, this has to occur in the middle of the
> >> night, or pot entially the next day which impacts backups, and the
> >> general network.
> >>
> >> I believe this is an error condition that could reasonably be dealt with
> >> programmatically instead of requiring user intervention (An automatic
> >> slot refresh before unloading tapes / loading tapes (with an assumed
> >> lifetime validity of say 10 minutes to reduce occurrences) would be one
> >> solution).
> >>
> >> Let me know if I need to add anything further, as I tried to be as
> >> detailed as possible in this response, as compared to the quick summary
> >> of the actual feature request. From a user prospective, I do agree that
> >> auto-changer support feels more tacked on than anything (for example,
> >> the requiring to specify a drive instead of an auto-changer when doing
> >> an update slots command) and would love to see improvements in that
> >> regard.
> >>
> >> -Blake
> >>
> >> ------------------------------------------------------------------------
> >>- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't
> >> miss this year's exciting event. There's still time to save $100. Use
> >> priority code J8TL2D2.
> >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/j
> >>ava one _______________________________________________
> >> Bacula-devel mailing list
> >> Bacula-devel AT lists.sourceforge DOT net
> >> https://lists.sourceforge.net/lists/listinfo/bacula-devel
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> > Don't miss this year's exciting event. There's still time to save $100.
> > Use priority code J8TL2D2.
> > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/ja
> >vaone _______________________________________________
> > Bacula-users mailing list
> > Bacula-users AT lists.sourceforge DOT net
> > https://lists.sourceforge.net/lists/listinfo/bacula-users



-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users