Bacula-users

[Bacula-users] Trouble getting jobs to run simultaneously

2012-12-07 20:14:03
Subject: [Bacula-users] Trouble getting jobs to run simultaneously
From: ccspro <bacula-forum AT backupcentral DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 07 Dec 2012 12:32:40 -0800
This is a very common issue that must be looked at carefully, this will be a 
large post and I apologize for it :P

Priority levels are very hard to manage/maintain since they can cause jobs to 
block! Meaning if a job has network (or other) issues and the Bacula server 
connection doesn't drop but the client does; you could be in a situation which 
I have faced numerous times over the last 3 years where over a weekend only the 
higher priority or equal to jobs will backup but others become stuck. 

I still use priorities (ex: catalog backup is 99, others are 1 to 10) but I've 
got other safe guards in place such as SD/FD timeouts to enforce job 
rescheduling (it requires you actually setup rescheduling parameters though and 
understand them, canceling a job may also reschedule it too - be warned)

There's another gotcha that can be seen more with this sort of configuration 
(using priorities) where Bacula is trying to contact a client who behave well 
(port not open, firewalled, whatever) you may need to use a port checking 
program to check the remote host/port to see if it is open and if not error the 
job via runscript option (this is ran from the Bacula server, not client 
obviously), 
*this* problem I've had at least for over a year or so and was the only way to 
solve that specific issue. 

After doing all of the above priorities seem to work now...

Heres the options you may want to look at for Job or JobDefs - this is straight 
from my config:

Allow Mixed Priority = yes
# Cancel, Error, Long running, Duplicate Job and Rerun control's go here
Allow Duplicate Jobs = no
Cancel Running Duplicates = yes

# avoid weekend queue build ups...
Cancel Queued Duplicates = yes

Rerun Failed Levels = yes
Reschedule On Error = yes
Reschedule Interval = 15n #15n = 15 minute (most jobs reschedule after an hour, 
not this one)

   # Manually canceling a job may require canceling as many times as 
"Reschedule times" if set for.
   # Meaning it has to "reschedule" the job and you have to cancel it over and 
over until it is really considered cancelled... - 
Reschedule Times = 3 # 3 * Interval

# if over 6h the job should be considered stuck, use this with caution!
Max Run Time = 6h

# the most important option to look at, what if we miss a full due to some 
error?
# this needs to be set per client by looking at the schedule and deciding on 
best number to set:
Max Full Interval=7 days

+----------------------------------------------------------------------
|This was sent by ccspro AT hotmail DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------



------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Trouble getting jobs to run simultaneously, ccspro <=