Bacula-users

Re: [Bacula-users] Backup of all jobs fail if host unavailable

2008-09-02 16:44:54
Subject: Re: [Bacula-users] Backup of all jobs fail if host unavailable
From: "Botha, Jacques (FNB)" <JacquesB AT fnb.co DOT za>
To: "Dan Langille" <dan AT langille DOT org>
Date: Tue, 2 Sep 2008 22:44:21 +0200
On Tue, 2008-09-02 at 16:18 -0400, Dan Langille wrote:
> Botha, Jacques (FNB) wrote:
> > On Tue, 2008-09-02 at 16:01 -0400, Dan Langille wrote:
> >> Botha, Jacques (FNB) wrote:
> >>> On Tue, 2008-09-02 at 15:46 -0400, Dan Langille wrote:
> >>>> Botha, Jacques (FNB) wrote:
> >>>>> Hi 
> >>>>>
> >>>>> All my backups are scheduled for the same time, then queue with the same
> >>>>> priority, and run one at a time as the previous jobs finishes.
> >>>>>
> >>>>> Today I've got a machine that is unavailable due to a hardware fault.
> >>>>> Naturally the backup for this machine failed, but, also every backup
> >>>>> that was in the queue for all other machines after this one ! 
> >>>>>
> >>>>> Please help !
> >>>>>
> >>>>> I'm running bacula 2.4.2 on CentOS 5.
> >>>> Perhaps if you supplied the failure messages...
> >>>>
> >>>
> >>> Sure
> >>>
> >>>
> >>> 2008-09-02 20:15:19Bacula_Director JobId 175: Fatal error: Max wait time
> >>> exceeded. Job canceled.
> >>> 2008-09-02 20:15:19Bacula_Director JobId 176: Fatal error: Max wait time
> >>> exceeded. Job canceled.
> >>> 2008-09-02 20:15:19Bacula_Director JobId 177: Fatal error: Max wait time
> >>> exceeded. Job canceled.
> >>>
> >>> And so forth until the last job.
> >>>
> >>>
> >>>
> >>> Some more config information which might be usefull:   
> >>>
> >>> Maximum Concurrent Jobs = 1
> >>>
> >>> each job has  Max Wait Time = 10 minutes defined.
> >>>
> >>>
> >>> So my understanding is that the unavailable machine would have blocked
> >>> all other backups for 10 minutes until it timed out, but then they
> >>> should have continued, not be cancelled as well.
> >>>
> >>> Where am I going wrong ?
> >> Max-wait time is perhaps not what you want.  Remove it or reconsider its 
> >> use.
> >>
> > 
> > According to the Bacula Manual: 
> > 
> > Max Wait Time = <time> The time specifies the maximum allowed
> > time that a job may block waiting for a resource (such as waiting
> > for a tape to be mounted, or waiting for the storage or file daemons
> > to perform their duties), counted from the when the job starts, (not
> > necessarily the same as when the job was scheduled).
> > 
> > So the unavailable machine, could block other jobs for 10 minutes.  Why
> > did the other jobs time out as well ?  They were not started yet, only
> > scheduled ?
> > 
> > If Max Wait Time is not what I am after, could you please point me in
> > the right direction ??
> 
> I don't know the answers.  I was short in my reply.  Sorry.  I mean: 
> stop using max wait time in the short term, to get your jobs running. 
> Hopefully someone else can help.
> 
> But off hand, I think max wait time is doing the wrong thing here.  Post 
> your entire job definition and we'll see.
> 


Here you go (sanitised a bit): 



Director {                           
  Name = Bacula_Director
  DIRport = 9101
  QueryFile = "/usr/lib64/bacula/query.sql"
  WorkingDirectory = "/var/lib/bacula"
  PidDirectory = "/var/run"
  Maximum Concurrent Jobs = 1
  Password = "XXX"
  Messages = Daemon
}

Job
{
        Name = Restore
        Type = Restore
        Client = Dummy_Client
        FileSet = Dummy_FileSet
        Storage = Bacula_Storage_Daemon
        Pool = Default
        Messages = Standard
        Where = /tmp/restore
        Max Wait Time = 2 minutes
}

Client 
{
        Name = Dummy_Client
        Address = Dummy_host
        Catalog = MyCatalog
        Password = Dummy_password
}


FileSet
{
        Name = Dummy_FileSet
        Include
        {
                File = /Dummy
        }
}


Storage {
  Name = Bacula_Storage_Daemon
  Address = backup-sd-01
  SDPort = 9103
  Password = "XXX"
  Device = Dummy_Device
  Media Type = Dummy-File
}


Schedule {
  Name = "MonthlyCycle"
  Run = Full 1st sun at 20:05
  Run = Differential 2nd-5th sun at 20:05
  Run = Incremental mon-sat at 20:05
}

Schedule {
  Name = "WeeklyCatalogBackup"
  Run = Full sun-sat at 23:10
}

Job {
  Name = "BackupCatalog"
  Type = Backup 
  Client = noc-01
  Level = Full
  FileSet="Catalog"
  Schedule = "WeeklyCatalogBackup"
  RunBeforeJob = "/usr/lib64/bacula/make_catalog_backup bacula bacula"
  RunAfterJob  = "/usr/lib64/bacula/delete_catalog_backup"
  Write Bootstrap = "/var/lib/bacula/BackupCatalog.bsr"
  Storage = noc-01-sd
  Messages = Standard
  Pool = Default
  Full Backup Pool = noc-01-catalog-full_pool
  Priority = 11                   # run after main backup
}


Storage {
  Name = noc-01-sd
  Address = backup-sd-01
  SDPort = 9103
  Password = "XXX"
  Device = noc-01-catalog
  Media Type = noc-01-catalog-File
}

Pool
{
        Name = noc-01-catalog-full_pool
        Pool Type = Backup
        Recycle = yes
        RecycleOldestVolume = yes
        Maximum Volume Jobs = 1
        AutoPrune = yes
        Volume Retention = 2 months
        Maximum Volume Bytes = 100g
        Maximum Volumes = 70
        Label Format = noc-01-catalog-full-
}

FileSet {
  Name = "Catalog"
  Include {
    Options {
      signature = MD5
    }
    File = /var/lib/bacula/bacula.sql
  }
}

Catalog {
  Name = MyCatalog
  dbname = bacula; user = bacula; password = "XXX"
}


Messages {
  Name = Standard
  mailcommand = "/opt/scripts/custom_bacula/custom_bacula 192.168.45.8 %
c"
  operatorcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r
\>\" -s \"Bacula: Intervention needed for %j\" %r"
  mail = noc-01 = all, !skipped            
  operator = root@localhost = mount
  console = all, !skipped, !saved
  catalog = all
  append = "/var/lib/bacula/log" = all, !skipped
}

Messages {
  Name = Daemon
  mailcommand = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\"
-s \"Bacula daemon message\" %r"
  mail = root@localhost = all, !skipped            
  console = all, !skipped, !saved
  append = "/var/lib/bacula/log" = all, !skipped
  catalog = all
}

Pool {
  Name = Default
  Pool Type = Backup
  Recycle = yes                       # Bacula can automatically recycle
Volumes
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 365 days         # one year
}


@/etc/bacula/Machines/host1-fd.conf
@/etc/bacula/Machines/host2-fd.conf
@/etc/bacula/Machines/host2-fd.conf
.
.
.



/etc/bacula/Machines/host1-fd.conf ....

Job
{
        Name = host1-backup
        Type = Backup
        Client = host1
        FileSet = host1-fs
        Pool = Default
        Full Backup Pool = host1-full_pool
        Differential Backup Pool = host1-diff_pool
        Incremental Backup Pool = host1-incr_pool
        Schedule = MonthlyCycle
        Storage = host1-sd
        Messages = Standard
        Write Bootstrap = /var/lib/bacula/host1.bsr
        Priority  = 10
        Max Wait Time = 10 minutes
}

Client
{
        Name = host1
        Address = host1
        FDPort = 9102
        Password = XXX
        Catalog = MyCatalog
        Fileretention = 5 years
        Jobretention = 5 years
        Autoprune = yes
}

FileSet
{
        Name = host1-fs
        Include
        {
                Options
                {
                        compression = gzip8
                        signature = SHA1
                        onefs=no
                }
                File = /etc
                File = /opt
        }
}

Pool
{
        Name = host1-full_pool
        Pool Type = Backup
        Recycle = yes
        RecycleOldestVolume = yes
        Maximum Volume Jobs = 1
        AutoPrune = yes
        Volume Retention = 1 month
        Maximum Volume Bytes = 999g
        Maximum Volumes = 100
        Label Format = host1-full-
}

Pool
{
        Name =  host1-diff_pool
        Pool Type = Backup
        Recycle = yes
        RecycleOldestVolume = yes
        Maximum Volume Jobs = 1
        AutoPrune = yes
        Volume Retention = 2 weeks
        Maximum Volume Bytes = 999g
        Maximum Volumes = 100
        Label Format = host1-diff-
}

Pool
{
        Name = host1-incr_pool
        Pool Type = Backup
        Recycle = yes
        RecycleOldestVolume = yes
        Maximum Volume Jobs = 1
        AutoPrune = yes
        Volume Retention = 2 weeks
        Maximum Volume Bytes = 999g
        Maximum Volumes = 100
        Label Format = host1-incr-
}


# Definition of file storage device
Storage {
  Name = host1-sd
# Do not use "localhost" here    
  Address = backup-sd-01
  SDPort = 9103
  Password = "XXX"
  Device = host1
  Media Type = host1-File
}

and so forth for host2 and host3 etc.





To read FirstRand Bank's Disclaimer for this email click on the following 
address or copy into your Internet browser: 
https://www.fnb.co.za/disclaimer.html 

If you are unable to access the Disclaimer, send a blank e-mail to
firstrandbankdisclaimer AT fnb.co DOT za and we will send you a copy of the 
Disclaimer.

Attachment: signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users