On Tue, 2008-09-02 at 16:18 -0400, Dan Langille wrote:
> Botha, Jacques (FNB) wrote:
> > On Tue, 2008-09-02 at 16:01 -0400, Dan Langille wrote:
> >> Botha, Jacques (FNB) wrote:
> >>> On Tue, 2008-09-02 at 15:46 -0400, Dan Langille wrote:
> >>>> Botha, Jacques (FNB) wrote:
> >>>>> Hi
> >>>>>
> >>>>> All my backups are scheduled for the same time, then queue with the same
> >>>>> priority, and run one at a time as the previous jobs finishes.
> >>>>>
> >>>>> Today I've got a machine that is unavailable due to a hardware fault.
> >>>>> Naturally the backup for this machine failed, but, also every backup
> >>>>> that was in the queue for all other machines after this one !
> >>>>>
> >>>>> Please help !
> >>>>>
> >>>>> I'm running bacula 2.4.2 on CentOS 5.
> >>>> Perhaps if you supplied the failure messages...
> >>>>
> >>>
> >>> Sure
> >>>
> >>>
> >>> 2008-09-02 20:15:19Bacula_Director JobId 175: Fatal error: Max wait time
> >>> exceeded. Job canceled.
> >>> 2008-09-02 20:15:19Bacula_Director JobId 176: Fatal error: Max wait time
> >>> exceeded. Job canceled.
> >>> 2008-09-02 20:15:19Bacula_Director JobId 177: Fatal error: Max wait time
> >>> exceeded. Job canceled.
> >>>
> >>> And so forth until the last job.
> >>>
> >>>
> >>>
> >>> Some more config information which might be usefull:
> >>>
> >>> Maximum Concurrent Jobs = 1
> >>>
> >>> each job has Max Wait Time = 10 minutes defined.
> >>>
> >>>
> >>> So my understanding is that the unavailable machine would have blocked
> >>> all other backups for 10 minutes until it timed out, but then they
> >>> should have continued, not be cancelled as well.
> >>>
> >>> Where am I going wrong ?
> >> Max-wait time is perhaps not what you want. Remove it or reconsider its
> >> use.
> >>
> >
> > According to the Bacula Manual:
> >
> > Max Wait Time = <time> The time specifies the maximum allowed
> > time that a job may block waiting for a resource (such as waiting
> > for a tape to be mounted, or waiting for the storage or file daemons
> > to perform their duties), counted from the when the job starts, (not
> > necessarily the same as when the job was scheduled).
> >
> > So the unavailable machine, could block other jobs for 10 minutes. Why
> > did the other jobs time out as well ? They were not started yet, only
> > scheduled ?
> >
> > If Max Wait Time is not what I am after, could you please point me in
> > the right direction ??
>
> I don't know the answers. I was short in my reply. Sorry. I mean:
> stop using max wait time in the short term, to get your jobs running.
> Hopefully someone else can help.
>
> But off hand, I think max wait time is doing the wrong thing here. Post
> your entire job definition and we'll see.
>
Here you go (sanitised a bit):
Director {
Name = Bacula_Director
DIRport = 9101
QueryFile = "/usr/lib64/bacula/query.sql"
WorkingDirectory = "/var/lib/bacula"
PidDirectory = "/var/run"
Maximum Concurrent Jobs = 1
Password = "XXX"
Messages = Daemon
}
Job
{
Name = Restore
Type = Restore
Client = Dummy_Client
FileSet = Dummy_FileSet
Storage = Bacula_Storage_Daemon
Pool = Default
Messages = Standard
Where = /tmp/restore
Max Wait Time = 2 minutes
}
Client
{
Name = Dummy_Client
Address = Dummy_host
Catalog = MyCatalog
Password = Dummy_password
}
FileSet
{
Name = Dummy_FileSet
Include
{
File = /Dummy
}
}
Storage {
Name = Bacula_Storage_Daemon
Address = backup-sd-01
SDPort = 9103
Password = "XXX"
Device = Dummy_Device
Media Type = Dummy-File
}
Schedule {
Name = "MonthlyCycle"
Run = Full 1st sun at 20:05
Run = Differential 2nd-5th sun at 20:05
Run = Incremental mon-sat at 20:05
}
Schedule {
Name = "WeeklyCatalogBackup"
Run = Full sun-sat at 23:10
}
Job {
Name = "BackupCatalog"
Type = Backup
Client = noc-01
Level = Full
FileSet="Catalog"
Schedule = "WeeklyCatalogBackup"
RunBeforeJob = "/usr/lib64/bacula/make_catalog_backup bacula bacula"
RunAfterJob = "/usr/lib64/bacula/delete_catalog_backup"
Write Bootstrap = "/var/lib/bacula/BackupCatalog.bsr"
Storage = noc-01-sd
Messages = Standard
Pool = Default
Full Backup Pool = noc-01-catalog-full_pool
Priority = 11 # run after main backup
}
Storage {
Name = noc-01-sd
Address = backup-sd-01
SDPort = 9103
Password = "XXX"
Device = noc-01-catalog
Media Type = noc-01-catalog-File
}
Pool
{
Name = noc-01-catalog-full_pool
Pool Type = Backup
Recycle = yes
RecycleOldestVolume = yes
Maximum Volume Jobs = 1
AutoPrune = yes
Volume Retention = 2 months
Maximum Volume Bytes = 100g
Maximum Volumes = 70
Label Format = noc-01-catalog-full-
}
FileSet {
Name = "Catalog"
Include {
Options {
signature = MD5
}
File = /var/lib/bacula/bacula.sql
}
}
Catalog {
Name = MyCatalog
dbname = bacula; user = bacula; password = "XXX"
}
Messages {
Name = Standard
mailcommand = "/opt/scripts/custom_bacula/custom_bacula 192.168.45.8 %
c"
operatorcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r
\>\" -s \"Bacula: Intervention needed for %j\" %r"
mail = noc-01 = all, !skipped
operator = root@localhost = mount
console = all, !skipped, !saved
catalog = all
append = "/var/lib/bacula/log" = all, !skipped
}
Messages {
Name = Daemon
mailcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\"
-s \"Bacula daemon message\" %r"
mail = root@localhost = all, !skipped
console = all, !skipped, !saved
append = "/var/lib/bacula/log" = all, !skipped
catalog = all
}
Pool {
Name = Default
Pool Type = Backup
Recycle = yes # Bacula can automatically recycle
Volumes
AutoPrune = yes # Prune expired volumes
Volume Retention = 365 days # one year
}
@/etc/bacula/Machines/host1-fd.conf
@/etc/bacula/Machines/host2-fd.conf
@/etc/bacula/Machines/host2-fd.conf
.
.
.
/etc/bacula/Machines/host1-fd.conf ....
Job
{
Name = host1-backup
Type = Backup
Client = host1
FileSet = host1-fs
Pool = Default
Full Backup Pool = host1-full_pool
Differential Backup Pool = host1-diff_pool
Incremental Backup Pool = host1-incr_pool
Schedule = MonthlyCycle
Storage = host1-sd
Messages = Standard
Write Bootstrap = /var/lib/bacula/host1.bsr
Priority = 10
Max Wait Time = 10 minutes
}
Client
{
Name = host1
Address = host1
FDPort = 9102
Password = XXX
Catalog = MyCatalog
Fileretention = 5 years
Jobretention = 5 years
Autoprune = yes
}
FileSet
{
Name = host1-fs
Include
{
Options
{
compression = gzip8
signature = SHA1
onefs=no
}
File = /etc
File = /opt
}
}
Pool
{
Name = host1-full_pool
Pool Type = Backup
Recycle = yes
RecycleOldestVolume = yes
Maximum Volume Jobs = 1
AutoPrune = yes
Volume Retention = 1 month
Maximum Volume Bytes = 999g
Maximum Volumes = 100
Label Format = host1-full-
}
Pool
{
Name = host1-diff_pool
Pool Type = Backup
Recycle = yes
RecycleOldestVolume = yes
Maximum Volume Jobs = 1
AutoPrune = yes
Volume Retention = 2 weeks
Maximum Volume Bytes = 999g
Maximum Volumes = 100
Label Format = host1-diff-
}
Pool
{
Name = host1-incr_pool
Pool Type = Backup
Recycle = yes
RecycleOldestVolume = yes
Maximum Volume Jobs = 1
AutoPrune = yes
Volume Retention = 2 weeks
Maximum Volume Bytes = 999g
Maximum Volumes = 100
Label Format = host1-incr-
}
# Definition of file storage device
Storage {
Name = host1-sd
# Do not use "localhost" here
Address = backup-sd-01
SDPort = 9103
Password = "XXX"
Device = host1
Media Type = host1-File
}
and so forth for host2 and host3 etc.
To read FirstRand Bank's Disclaimer for this email click on the following
address or copy into your Internet browser:
https://www.fnb.co.za/disclaimer.html
If you are unable to access the Disclaimer, send a blank e-mail to
firstrandbankdisclaimer AT fnb.co DOT za and we will send you a copy of the
Disclaimer.
signature.asc
Description: This is a digitally signed message part
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|