Hi,
a few days ago I wrote to this list and asked for help with skiped
files from backups. Meanwhile I dug a litte deeper. I get a verbose
output from the Bacula client, but I still don't know why it skips,
say all files from /sbin/...
Starting one of the failing clients with "-d100" presented me with
(among other seemingly unimportant messages) this:
failing-client-fd: job.c:249-0 Executing JobId= command.
failing-client-fd: job.c:233-0 <dird: fileset vss=1
failing-client-fd: job.c:249-0 Executing fileset command.
failing-client-fd: job.c:688-0 I
failing-client-fd: job.c:688-0 O e
failing-client-fd: job.c:688-0 W /vservers/*/home/*
failing-client-fd: job.c:688-0 W /vservers/*/proc/*
failing-client-fd: job.c:688-0 W /vservers/*/tmp/*
failing-client-fd: job.c:688-0 W /vservers/*/sys/*
failing-client-fd: job.c:688-0 W /vservers/*/opt/*
failing-client-fd: job.c:688-0 W /vservers/*/mnt/*
failing-client-fd: job.c:688-0 W /vservers/*/var/cache/*
failing-client-fd: job.c:688-0 W /vservers/*/var/log/*
failing-client-fd: job.c:688-0 W /vservers/*/var/www/*
failing-client-fd: job.c:688-0 W /vservers/*/var/lib/*
failing-client-fd: job.c:688-0 W /vservers/*/usr/portage/*
failing-client-fd: job.c:688-0 W /vservers/*/usr/src/*
failing-client-fd: job.c:688-0 W /vservers/*/var/backup/*
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 O e
failing-client-fd: job.c:688-0 RF .*.tgz*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 RF .*.tar*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 RF .*.tbz*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 RF .*.gz*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 O MfZ10
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 F /vservers
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 E
failing-client-fd: job.c:688-0 F /vservers/lost+found
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:233-0 <dird: level = full mtime_only=0
failing-client-fd: job.c:249-0 Executing level = command.
failing-client-fd: job.c:233-0 <dird: storage address=xxx port=9103
ssl=0
failing-client-fd: job.c:249-0 Executing storage command.
failing-client-fd: job.c:1291-0 StorageCmd: storage address=xxx
port=9103 ssl=0
failing-client-fd: bsock.c:195-0 Current host[ipv4:xxx:9103] All
host[ipv4:xxx:9103]
failing-client-fd: bsock.c:149-0 who=Storage daemon host=xxx port=9103
failing-client-fd: cram-md5.c:133-0 cram-get received: auth cram-md5
<xxx> ssl=0
failing-client-fd: cram-md5.c:152-0 sending resp to challenge: xxx
failing-client-fd: cram-md5.c:80-0 send: auth cram-md5 <xxx> ssl=0
failing-client-fd: cram-md5.c:99-0 Authenticate OK xxx
failing-client-fd: job.c:233-0 <dird: backup
failing-client-fd: job.c:249-0 Executing backup command.
failing-client-fd: jcr.c:603-0 OnEntry JobStatus=C set=B
failing-client-fd: jcr.c:623-0 OnExit JobStatus=B set=B
failing-client-fd: job.c:1350-0 begin backup ff=80be920
failing-client-fd: jcr.c:603-0 OnEntry JobStatus=B set=R
failing-client-fd: jcr.c:623-0 OnExit JobStatus=R set=R
failing-client-fd: find.c:93-0 Enter set_find_options()
failing-client-fd: find.c:96-0 Leave set_find_options()
failing-client-fd: find.c:198-0 F /vservers
failing-client-fd: find.c:350-0 Reject wild2: /vservers/lost+found
failing-client-fd: find.c:397-0 Skip file /vservers/lost+found
failing-client-fd: crypto.c:600-0 crypto_digest_new jcr=80be3a0
failing-client-fd: find.c:397-0 Skip file /vservers/www-master-xxx-
i686-070228-2024.tar.bz2
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-
xxx-070305.tgz
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-
xxx-070307.tgz
failing-client-fd: find.c:279-0 Exclude wild: /vservers/*/proc/* file=/
vservers/www-blog-xxx/proc/.keep
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/
proc/.keep
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/sbin/
debugfs
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/sbin/
findfs
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/
sbin/.keep
[...]
Please excuse the secrecy. This is a production system.
Any pointers?
Thanks much
Stefan
On 15.06.2008, at 21:35, Stefan Nicolin wrote:
> Hi,
>
> I have a middle sized Bacula setup with 30 Unix clients and over 300
> job definitions. There are many clients that inherit one particularly
> jobdef. With three clients one job is allways broken. I can reproduce
> it but I don't understand why it happens. The jobs finish ok, but
> there are no files saved to the storage daemon. This even get's
> crazier! Doing a "estimate listing" on one job, I see that every
> directory that contains the string "bin" in the name does _not_ get
> included in the backup. Things like "/bin", "/usr/sbin" and so on are
> all excluded... and this happens only on three clients out of 30.
> Sadly this system is in "production".... it kinda gives me bad
> dreams :(
>
> ===
> This is a line from "list jobs" with one of such broken backups:
> +-------+-----------------------------------------------
> +---------------------+------+-------+-----------+----------------
> +-----------+
> | JobId | Name |
> StartTime | Type | Level | JobFiles | JobBytes |
> JobStatus |
> +-------+-----------------------------------------------
> +---------------------+------+-------+-----------+----------------
> +-----------+
> | 438 | system xxx vserver | 2008-05-25 02:52:55 |
> B | F | 18,680 | 0 | T |
>
> Note that Bacula assigns the status of "T". It looks like there are
> some job bytes but no files are saved on the storage daemon.
> I've tried the obvious:
> - restart daemons (client, server and so on)
> - reinstall client (also tried the newest bacula release 2.4.0)
> - tripple check config - but since there are dozens of other clients
> with the same jobs and settings where this job succeds, I think I can
> exclude config errors
> - try to spot similarities - one client runs on a amd64 architecture,
> the other two where it also fails are i686 (all three are Gentoo Linux
> installations).
>
> ===
> Here is the log entry for one broken job:
> 25-Mai 02:43 db-backup-smedia-dir JobId 438: No prior Full backup Job
> record found.
> 25-Mai 02:43 db-backup-smedia-dir JobId 438: No prior or suitable Full
> backup found in catalog. Doing FULL backup.
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Start Backup JobId 438,
> Job=system_xxx_vserver.2008-05-25_02.43.23
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: There are no more Jobs
> associated with Volume "sys-full0237". Marking it purged.
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: All records pruned from
> Volume "sys-full0237"; marking it "Purged"
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Recycled volume "sys-
> full0237"
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Using Device
> "FileStorage"
> 25-Mai 02:52 db-backup-smedia-sd JobId 438: Recycled volume "sys-
> full0237" on device "FileStorage" (/mnt/backup/store), all previous
> data lost.
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Max Volume jobs exceeded.
> Marking Volume "sys-full0237" as Used.
> 25-Mai 02:52 db-backup-smedia-sd JobId 438: Spooling data ...
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Job write elapsed time =
> 00:00:48, Transfer rate = 58.95 K bytes/second
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Committing spooled data to
> Volume "sys-full0237". Despooling 3,056,725 bytes ...
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Despooling elapsed time =
> 00:00:01, Transfer rate = 3.056 M bytes/second
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Sending spooled attrs to
> the Director. Despooling 4,660,551 bytes ...
> 25-Mai 02:54 db-backup-smedia-dir JobId 438: Bacula db-backup-sm-dir
> 2.2.8 (26Jan08): 25-Mai-2008 02:54:25
> Build OS: i686-pc-linux-gnu gentoo 1.6.14
> JobId: 438
> Job: system_xxx_vserver.2008-05-25_02.43.23
> Backup Level: Full (upgraded from Incremental)
> Client: "xxx-fd" 2.2.8 (26Jan08) i686-pc-linux-
> gnu,gentoo,1.12.6
> FileSet: "xendom vserver-linux system" 2008-05-25
> 02:43:00
> Pool: "sys-full" (From Run FullPool override)
> Storage: "File" (From Pool resource)
> Scheduled time: 25-Mai-2008 02:43:00
> Start time: 25-Mai-2008 02:52:55
> End time: 25-Mai-2008 02:54:25
> Elapsed time: 1 min 30 secs
> Priority: 10
> FD Files Written: 18,680
> SD Files Written: 18,680
> FD Bytes Written: 0 (0 B)
> SD Bytes Written: 2,829,813 (2.829 MB)
> Rate: 0.0 KB/s
> Software Compression: None
> VSS: no
> Storage Encryption: no
> Volume name(s): sys-full0237
> Volume Session Id: 458
> Volume Session Time: 1211243056
> Last Volume Bytes: 5,333,750 (5.333 MB)
> Non-fatal FD errors: 0
> SD Errors: 0
> FD termination status: OK
> SD termination status: OK
> Termination: Backup OK
>
> 25-Mai 02:54 db-backup-sm-dir JobId 438: Begin pruning Jobs.
> 25-Mai 02:54 db-backup-sm-dir JobId 438: No Jobs found to prune.
> 25-Mai 02:54 db-backup-sm-dir JobId 438: Begin pruning Files.
>
>
> ===
> Bacula and client versions:
> Clients are mostly Gentoo Linux (amd64 and i686) and FreeBSD 5.x.
> Bacula dir, storage, fd version: 2.2.8 (from Gentoo portage)
>
> ===
> Here are the relevant Bacula config bits:
>
> 1. job definition
> Job {
> Name = "system xxx"
> Client = xxx-fd
> JobDefs = "xendomain-linux system"
> Write Bootstrap = "/var/lib/bacula/xxx.bsr"
> }
>
> 2. jobdef def...
> JobDefs {
> Name = "xendomain-linux system"
> Type = Backup
> Level = Full
> FileSet = "xendomain-linux system"
> Schedule = "Sys2MonthsCycle"
> Storage = File
> Messages = Standard
> Pool = sys-full
> Priority = 10
> }
>
> 3. fileset
> FileSet {
> Name = "xendomain-linux system"
> Ignore FileSet Changes = no
> Include {
> Options {
> wild = "/usr/src/*"
> wild = "/var/cache/*"
> wild = "/tmp/*"
> wild = "/opt/*"
> wild = "/var/log/*"
> wild = "/var/www/*"
> wild = "/usr/packages/*"
> wild = "/vservers/*"
> wild = "/proc/*"
> wild = "/sys/*"
> wild = "/mnt/*"
> wild = "/usr/portage/*"
> exclude = yes
> }
> Options {
> signature = MD5
> onefs = no
> compression = GZIP1
> checkfilechanges = no
> }
> File = /
> }
> }
>
> 4. pool
> Pool {
> Storage = File
> Name = sys-full
> Pool Type = Backup
> Recycle = yes
> AutoPrune = yes
> Volume Retention = 4 months 5 days
> Recycle Oldest Volume = yes
> Maximum Volume Jobs = 1
> Label Format = "sys-full"
> }
>
> 5. storage def
> Storage {
> Name = File
> Address = db-backup-sm # N.B. Use a fully qualified
> name here
> SDPort = 9103
> Password = "xx"
> Device = FileStorage
> Media Type = File
> Maximum Concurrent Jobs = 20
> }
>
> 6. schedule
> Schedule {
> Name = "Sys2MonthsCycle"
> Run = SpoolData = yes FullPool=sys-full IncrementalPool=sys-inc Full
> 1st sun on may at 00:17
> }
>
>
> Hope someone can help.
> Thanks much folks.
>
> Stefan
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|