Bacula-users

Re: [Bacula-users] FYI: Scheduling policy and promoting backups (fwd)

2008-06-20 09:27:56
Subject: Re: [Bacula-users] FYI: Scheduling policy and promoting backups (fwd)
From: Kern Sibbald <kern AT sibbald DOT com>
To: Alan Brown <ajb2 AT mssl.ucl.ac DOT uk>
Date: Fri, 20 Jun 2008 15:28:01 +0200
On Friday 20 June 2008 14:14:06 Alan Brown wrote:
> On Fri, 20 Jun 2008, Kern Sibbald wrote:
> > > There have been several complaints about backup promotion happening at
> > > queue time and 2 more people have complained about it in this thread.
> > >
> > > Would you mind reconsidering your position?
> >
> > I am always willing to reconsider something if some new information
> > arrives. In this case, I am sorry, but I really have no idea what you are
> > referring to.  I have a general idea what backup promotion is, but
> > absolutely no idea what you mean about queue and dequeue time.  Even for
> > backup promotion, it would be helpful to mention the specific directives
> > involved.
> >
> > Regards,
> >
> > Kern
> >
> > PS: the current trunk code has much better code for handling multiple job
> > scheduling, failures, and job level promotion, so whatever is discussed
> > needs to consider the new code.
>
> The problem revolves around the "Rerun Failed Levels" directive.
>
> Currently jobs are looking for the status of previous jobs when they are
> placed in the run queue and deciding to upgrade themselves or not.
>
> This means that if a full backup is running, but not completed when the
> next incremental is started, it upgrades itself to Full even if concurrent
> job directives prevent it from actually starting until the already-running
> Full backup is completed.
>
> When the previous job completes, the queued job exists the run queue and
> starts executing, at level Full - when the next scheduled incremental job
> is queued, it sees a non-complete Full backup and upgrades itself to Full.
>
>
> This cycle then repeats endlessly, wasting a _lot_ of tape and time.
>
>
> Jobs should not check for previous failed/incomplete jobs until the moment
> they actually start running - This allows for previous jobs at higher
> level which may have been in progress at the time an incremental or
> differential job was queued to finish and exit BEFORE the job which is now
> waiting for Max Jobs can run its "Rerun Failed Levels" test.
>
> This is especially true given that a Full Backup may take longer than 24
> hours to complete on a large filesystem.
>
> Every enterprise-level admin I've spoken to about this problem has
> stated the current algorithm is broken.
>
> There have been around 10 complaints in bacula-users since January about
> how Rerun Failed Levels is implemented - every one has complained that an
> Incremental job was Raised to Full while a Full backup was running.
>
> Because of the way the Rerun code has been implemented we have had to
> disable it and currently resort to writing a SQL report of how long since
> the last Full/Differential backups were made and manually initiate full
> backups if necessary.
>
> I hope that clarifies the issue.

Yes, thanks for the clarification.  We are well aware of the problems with 
RerunFailedLevels.  The main problem there is that there is insufficient 
control.  The RerunFailedLevels does what it was designed to do -- what is 
needed (and now hopefully implemented) was a good deal more.  There are now 
four new directives that permit controlling duplicate jobs that work.

I believe (testing will prove it or not) that all the problems you state above 
will be resolved with the new directives that are now in the development 
version.  They allow a lot more control of what happens when multiple jobs of 
the same name are started.  There are also new directives to specifying a 
maximum time between Full or Differential jobs.   Unfortunately, the new 
directives are not yet documented though they are *very* briefly mentioned at 
the top of the technotes-2.5 document.  In addition to not yet having 
documentation, we have no regression tests for them yet.  We have been 
totally consumed for quite some months trying to duplicate and fix a number 
complicated bugs in 2.2.8 and now in 2.4.

Once the documentation is written, I will be very interested to get your 
comments on the new directives.

Best regards,

Kern

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Bacula-users] FYI: Scheduling policy and promoting backups (fwd), Kern Sibbald <=