Bacula-users

Re: [Bacula-users] Multiple full backups in same month

2015-06-25 15:08:22
Subject: Re: [Bacula-users] Multiple full backups in same month
From: Silver Salonen <silver.salonen AT gmail DOT com>
To: thomasl AT mtl.mit DOT edu
Date: Thu, 25 Jun 2015 22:07:06 +0300
On Thu, Jun 25, 2015 at 8:55 PM, Thomas Lohman <thomasl AT mtl.mit DOT edu> wrote:
> Ok, so the option "Allow Duplicate Job=no" can at least prevent multiple
> full backups of the same server in a row as stated before?

As others mentioned, I think it may help in your case but it may not
completely solve the problem that you saw.  It looks like you had 5
instances of the same job queued up at the same time.  Disallowing
duplicate jobs would mean the last 4 would be canceled once queued (but
after being upgraded to Full).  Now, if we assume your original Full job
actually ended up running and completed successfully, your next instance
of this job will still get upgraded to Full I suspect since it's going
to see the canceled jobs as "newer" than that successful Full.  The
problem, I think, is what I described here in bug 1882

"The original 5.2.13 behavior when determining if a failed job needs to
be rerun was to look at the start time of the most recent successful
backup. From there it would then see if any job had started since then
and failed. As pointed out, this creates an issue when you have FULL
jobs that tend to run longer than the time period between normal backups
for those jobs. i.e. the job laps itself so to speak. Any new jobs would
be upgraded to FULLs and then canceled since the original FULL was still
running (this assumes that duplicate jobs are not allowed). But once the
original FULL finished, Bacula was grabbing it's start time and then
seeing those canceled FULL jobs that happened since the successful FULL
was started. To me, it seems like looking at the end time of that
successful job makes more sense."

The change I made was to have Bacula look at the real end time of the
last successful job and then see if any jobs have failed since that
time.  This fixed these type of issues for us.  Sorry that this probably
doesn't help you with fixing it right now if you're running 7.0.x, but I
think it does explain the behavior that you're seeing and also says that
it is still there in 7.0.x

And just for completeness, these are the related settings that we run with:

Allow Duplicate Jobs = no
Cancel Lower Level Duplicates = yes
Cancel Queued Duplicates = yes
Cancel Running Duplicates = no
Rerun Failed Levels = yes

hope this helps,


--tom

Wouldn't this changed behavior run into the problem that cancelled duplicates are still seen as failed jobs and therefore jobs would be upgraded still?

Eg:
  1. Full starts
  2. Incr is queued, upgraded to Full and cancelled.
  3. Full ends
  4. Incr is queued, checks that Full job no. 1 finished OK, but then checks that Incr->Full job no. 2 failed - thus it's still upgraded to Full and started.
--
Silver
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users