Bacula-users

[Bacula-users] Broken backups with some clients - help!

2008-06-15 15:35:19
Subject: [Bacula-users] Broken backups with some clients - help!
From: Stefan Nicolin <bacula AT nicolinux DOT org>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Sun, 15 Jun 2008 21:35:02 +0200
Hi,

I have a middle sized Bacula setup with 30 Unix clients and over 300  
job definitions. There are many clients that inherit one particularly  
jobdef. With three clients one job is allways broken. I can reproduce  
it but I don't understand why it happens. The jobs finish ok, but  
there are no files saved to the storage daemon. This even get's  
crazier! Doing a "estimate listing" on one job, I see that every  
directory that contains the string "bin" in the name does _not_ get  
included in the backup. Things like "/bin", "/usr/sbin" and so on are  
all excluded... and this happens only on three clients out of 30.
Sadly this system is in "production".... it kinda gives me bad dreams :(

===
This is a line from "list jobs" with one of such broken backups:
+-------+----------------------------------------------- 
+---------------------+------+-------+-----------+---------------- 
+-----------+
| JobId | Name                                          |  
StartTime           | Type | Level | JobFiles  | JobBytes       |  
JobStatus |
+-------+----------------------------------------------- 
+---------------------+------+-------+-----------+---------------- 
+-----------+
|   438 | system xxx vserver                   | 2008-05-25 02:52:55 |  
B    | F     |    18,680 |              0 | T         |

Note that Bacula assigns the status of "T". It looks like there are  
some job bytes but no files are saved on the storage daemon.
I've tried the obvious:
- restart daemons (client, server and so on)
- reinstall client (also tried the newest bacula release 2.4.0)
- tripple check config - but since there are dozens of other clients  
with the same jobs and settings where this job succeds, I think I can  
exclude config errors
- try to spot similarities - one client runs on a amd64 architecture,  
the other two where it also fails are i686 (all three are Gentoo Linux  
installations).

===
Here is the log entry for one broken job:
25-Mai 02:43 db-backup-smedia-dir JobId 438: No prior Full backup Job  
record found.
25-Mai 02:43 db-backup-smedia-dir JobId 438: No prior or suitable Full  
backup found in catalog. Doing FULL backup.
25-Mai 02:52 db-backup-smedia-dir JobId 438: Start Backup JobId 438,  
Job=system_xxx_vserver.2008-05-25_02.43.23
25-Mai 02:52 db-backup-smedia-dir JobId 438: There are no more Jobs  
associated with Volume "sys-full0237". Marking it purged.
25-Mai 02:52 db-backup-smedia-dir JobId 438: All records pruned from  
Volume "sys-full0237"; marking it "Purged"
25-Mai 02:52 db-backup-smedia-dir JobId 438: Recycled volume "sys- 
full0237"
25-Mai 02:52 db-backup-smedia-dir JobId 438: Using Device "FileStorage"
25-Mai 02:52 db-backup-smedia-sd JobId 438: Recycled volume "sys- 
full0237" on device "FileStorage" (/mnt/backup/store), all previous  
data lost.
25-Mai 02:52 db-backup-smedia-dir JobId 438: Max Volume jobs exceeded.  
Marking Volume "sys-full0237" as Used.
25-Mai 02:52 db-backup-smedia-sd JobId 438: Spooling data ...
25-Mai 02:53 db-backup-smedia-sd JobId 438: Job write elapsed time =  
00:00:48, Transfer rate = 58.95 K bytes/second
25-Mai 02:53 db-backup-smedia-sd JobId 438: Committing spooled data to  
Volume "sys-full0237". Despooling 3,056,725 bytes ...
25-Mai 02:53 db-backup-smedia-sd JobId 438: Despooling elapsed time =  
00:00:01, Transfer rate = 3.056 M bytes/second
25-Mai 02:53 db-backup-smedia-sd JobId 438: Sending spooled attrs to  
the Director. Despooling 4,660,551 bytes ...
25-Mai 02:54 db-backup-smedia-dir JobId 438: Bacula db-backup-sm-dir  
2.2.8 (26Jan08): 25-Mai-2008 02:54:25
  Build OS:               i686-pc-linux-gnu gentoo 1.6.14
  JobId:                  438
  Job:                    system_xxx_vserver.2008-05-25_02.43.23
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "xxx-fd" 2.2.8 (26Jan08) i686-pc-linux- 
gnu,gentoo,1.12.6
  FileSet:                "xendom vserver-linux system" 2008-05-25  
02:43:00
  Pool:                   "sys-full" (From Run FullPool override)
  Storage:                "File" (From Pool resource)
  Scheduled time:         25-Mai-2008 02:43:00
  Start time:             25-Mai-2008 02:52:55
  End time:               25-Mai-2008 02:54:25
  Elapsed time:           1 min 30 secs
  Priority:               10
  FD Files Written:       18,680
  SD Files Written:       18,680
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       2,829,813 (2.829 MB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         sys-full0237
  Volume Session Id:      458
  Volume Session Time:    1211243056
  Last Volume Bytes:      5,333,750 (5.333 MB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK

25-Mai 02:54 db-backup-sm-dir JobId 438: Begin pruning Jobs.
25-Mai 02:54 db-backup-sm-dir JobId 438: No Jobs found to prune.
25-Mai 02:54 db-backup-sm-dir JobId 438: Begin pruning Files.


===
Bacula and client versions:
Clients are mostly Gentoo Linux (amd64 and i686) and FreeBSD 5.x.
Bacula dir, storage, fd version: 2.2.8 (from Gentoo portage)

===
Here are the relevant Bacula config bits:

1. job definition
Job {
  Name = "system xxx"
  Client = xxx-fd
  JobDefs = "xendomain-linux system"
  Write Bootstrap = "/var/lib/bacula/xxx.bsr"
}

2. jobdef def...
JobDefs {
  Name = "xendomain-linux system"
  Type = Backup
  Level = Full
  FileSet = "xendomain-linux system"
  Schedule = "Sys2MonthsCycle"
  Storage = File
  Messages = Standard
  Pool = sys-full
  Priority = 10
}

3. fileset
FileSet {
  Name = "xendomain-linux system"
  Ignore FileSet Changes = no
  Include {
    Options {
      wild = "/usr/src/*"
      wild = "/var/cache/*"
      wild = "/tmp/*"
      wild = "/opt/*"
      wild = "/var/log/*"
      wild = "/var/www/*"
      wild = "/usr/packages/*"
      wild = "/vservers/*"
      wild = "/proc/*"
      wild = "/sys/*"
      wild = "/mnt/*"
      wild = "/usr/portage/*"
      exclude = yes
    }
    Options {
      signature = MD5
      onefs = no
      compression = GZIP1
      checkfilechanges = no
    }
    File = /
  }
}

4. pool
Pool {
  Storage = File
  Name = sys-full
  Pool Type = Backup
  Recycle = yes
  AutoPrune = yes
  Volume Retention = 4 months 5 days
  Recycle Oldest Volume = yes
  Maximum Volume Jobs = 1
  Label Format = "sys-full"
}

5. storage def
Storage {
  Name = File
  Address = db-backup-sm                # N.B. Use a fully qualified  
name here
  SDPort = 9103
  Password = "xx"
  Device = FileStorage
  Media Type = File
  Maximum Concurrent Jobs = 20
}

6. schedule
Schedule {
  Name = "Sys2MonthsCycle"
  Run = SpoolData = yes FullPool=sys-full IncrementalPool=sys-inc Full  
1st sun on may at 00:17
}


Hope someone can help.
Thanks much folks.

Stefan

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>