Bacula-users

Re: [Bacula-users] Broken backups with some clients - still no help!

2008-06-22 02:23:38
Subject: Re: [Bacula-users] Broken backups with some clients - still no help!
From: Stefan Nicolin <bacula AT nicolinux DOT org>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Fri, 20 Jun 2008 19:51:16 +0200
Hi,

a few days ago I wrote to this list and asked for help with skiped  
files from backups. Meanwhile I dug a litte deeper. I get a verbose  
output from the Bacula client, but I still don't know why it skips,  
say all files from /sbin/...
Starting one of the failing clients with "-d100" presented me with  
(among other seemingly unimportant messages) this:

failing-client-fd: job.c:249-0 Executing JobId= command.
failing-client-fd: job.c:233-0 <dird: fileset vss=1
failing-client-fd: job.c:249-0 Executing fileset command.
failing-client-fd: job.c:688-0 I
failing-client-fd: job.c:688-0 O e
failing-client-fd: job.c:688-0 W /vservers/*/home/*
failing-client-fd: job.c:688-0 W /vservers/*/proc/*
failing-client-fd: job.c:688-0 W /vservers/*/tmp/*
failing-client-fd: job.c:688-0 W /vservers/*/sys/*
failing-client-fd: job.c:688-0 W /vservers/*/opt/*
failing-client-fd: job.c:688-0 W /vservers/*/mnt/*
failing-client-fd: job.c:688-0 W /vservers/*/var/cache/*
failing-client-fd: job.c:688-0 W /vservers/*/var/log/*
failing-client-fd: job.c:688-0 W /vservers/*/var/www/*
failing-client-fd: job.c:688-0 W /vservers/*/var/lib/*
failing-client-fd: job.c:688-0 W /vservers/*/usr/portage/*
failing-client-fd: job.c:688-0 W /vservers/*/usr/src/*
failing-client-fd: job.c:688-0 W /vservers/*/var/backup/*
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 O e
failing-client-fd: job.c:688-0 RF .*.tgz*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 RF .*.tar*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 RF .*.tbz*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 RF .*.gz*
failing-client-fd: job.c:716-0 Set state=error
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 O MfZ10
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 F /vservers
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:688-0 E
failing-client-fd: job.c:688-0 F /vservers/lost+found
failing-client-fd: job.c:688-0 N
failing-client-fd: job.c:233-0 <dird: level = full  mtime_only=0
failing-client-fd: job.c:249-0 Executing level =  command.
failing-client-fd: job.c:233-0 <dird: storage address=xxx port=9103  
ssl=0
failing-client-fd: job.c:249-0 Executing storage  command.
failing-client-fd: job.c:1291-0 StorageCmd: storage address=xxx  
port=9103 ssl=0
failing-client-fd: bsock.c:195-0 Current host[ipv4:xxx:9103] All  
host[ipv4:xxx:9103]
failing-client-fd: bsock.c:149-0 who=Storage daemon host=xxx port=9103
failing-client-fd: cram-md5.c:133-0 cram-get received: auth cram-md5  
<xxx> ssl=0
failing-client-fd: cram-md5.c:152-0 sending resp to challenge: xxx
failing-client-fd: cram-md5.c:80-0 send: auth cram-md5 <xxx> ssl=0
failing-client-fd: cram-md5.c:99-0 Authenticate OK xxx
failing-client-fd: job.c:233-0 <dird: backup
failing-client-fd: job.c:249-0 Executing backup command.
failing-client-fd: jcr.c:603-0 OnEntry JobStatus=C set=B
failing-client-fd: jcr.c:623-0 OnExit JobStatus=B set=B
failing-client-fd: job.c:1350-0 begin backup ff=80be920
failing-client-fd: jcr.c:603-0 OnEntry JobStatus=B set=R
failing-client-fd: jcr.c:623-0 OnExit JobStatus=R set=R
failing-client-fd: find.c:93-0 Enter set_find_options()
failing-client-fd: find.c:96-0 Leave set_find_options()
failing-client-fd: find.c:198-0 F /vservers
failing-client-fd: find.c:350-0 Reject wild2: /vservers/lost+found
failing-client-fd: find.c:397-0 Skip file /vservers/lost+found
failing-client-fd: crypto.c:600-0 crypto_digest_new jcr=80be3a0
failing-client-fd: find.c:397-0 Skip file /vservers/www-master-xxx- 
i686-070228-2024.tar.bz2
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog- 
xxx-070305.tgz
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog- 
xxx-070307.tgz
failing-client-fd: find.c:279-0 Exclude wild: /vservers/*/proc/* file=/ 
vservers/www-blog-xxx/proc/.keep
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/ 
proc/.keep
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/sbin/ 
debugfs
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/sbin/ 
findfs
failing-client-fd: find.c:397-0 Skip file /vservers/www-blog-xxx/ 
sbin/.keep
[...]

Please excuse the secrecy. This is a production system.
Any pointers?

Thanks much

Stefan

On 15.06.2008, at 21:35, Stefan Nicolin wrote:

> Hi,
>
> I have a middle sized Bacula setup with 30 Unix clients and over 300
> job definitions. There are many clients that inherit one particularly
> jobdef. With three clients one job is allways broken. I can reproduce
> it but I don't understand why it happens. The jobs finish ok, but
> there are no files saved to the storage daemon. This even get's
> crazier! Doing a "estimate listing" on one job, I see that every
> directory that contains the string "bin" in the name does _not_ get
> included in the backup. Things like "/bin", "/usr/sbin" and so on are
> all excluded... and this happens only on three clients out of 30.
> Sadly this system is in "production".... it kinda gives me bad  
> dreams :(
>
> ===
> This is a line from "list jobs" with one of such broken backups:
> +-------+-----------------------------------------------
> +---------------------+------+-------+-----------+----------------
> +-----------+
> | JobId | Name                                          |
> StartTime           | Type | Level | JobFiles  | JobBytes       |
> JobStatus |
> +-------+-----------------------------------------------
> +---------------------+------+-------+-----------+----------------
> +-----------+
> |   438 | system xxx vserver                   | 2008-05-25 02:52:55 |
> B    | F     |    18,680 |              0 | T         |
>
> Note that Bacula assigns the status of "T". It looks like there are
> some job bytes but no files are saved on the storage daemon.
> I've tried the obvious:
> - restart daemons (client, server and so on)
> - reinstall client (also tried the newest bacula release 2.4.0)
> - tripple check config - but since there are dozens of other clients
> with the same jobs and settings where this job succeds, I think I can
> exclude config errors
> - try to spot similarities - one client runs on a amd64 architecture,
> the other two where it also fails are i686 (all three are Gentoo Linux
> installations).
>
> ===
> Here is the log entry for one broken job:
> 25-Mai 02:43 db-backup-smedia-dir JobId 438: No prior Full backup Job
> record found.
> 25-Mai 02:43 db-backup-smedia-dir JobId 438: No prior or suitable Full
> backup found in catalog. Doing FULL backup.
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Start Backup JobId 438,
> Job=system_xxx_vserver.2008-05-25_02.43.23
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: There are no more Jobs
> associated with Volume "sys-full0237". Marking it purged.
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: All records pruned from
> Volume "sys-full0237"; marking it "Purged"
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Recycled volume "sys-
> full0237"
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Using Device  
> "FileStorage"
> 25-Mai 02:52 db-backup-smedia-sd JobId 438: Recycled volume "sys-
> full0237" on device "FileStorage" (/mnt/backup/store), all previous
> data lost.
> 25-Mai 02:52 db-backup-smedia-dir JobId 438: Max Volume jobs exceeded.
> Marking Volume "sys-full0237" as Used.
> 25-Mai 02:52 db-backup-smedia-sd JobId 438: Spooling data ...
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Job write elapsed time =
> 00:00:48, Transfer rate = 58.95 K bytes/second
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Committing spooled data to
> Volume "sys-full0237". Despooling 3,056,725 bytes ...
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Despooling elapsed time =
> 00:00:01, Transfer rate = 3.056 M bytes/second
> 25-Mai 02:53 db-backup-smedia-sd JobId 438: Sending spooled attrs to
> the Director. Despooling 4,660,551 bytes ...
> 25-Mai 02:54 db-backup-smedia-dir JobId 438: Bacula db-backup-sm-dir
> 2.2.8 (26Jan08): 25-Mai-2008 02:54:25
> Build OS:               i686-pc-linux-gnu gentoo 1.6.14
> JobId:                  438
> Job:                    system_xxx_vserver.2008-05-25_02.43.23
> Backup Level:           Full (upgraded from Incremental)
> Client:                 "xxx-fd" 2.2.8 (26Jan08) i686-pc-linux-
> gnu,gentoo,1.12.6
> FileSet:                "xendom vserver-linux system" 2008-05-25
> 02:43:00
> Pool:                   "sys-full" (From Run FullPool override)
> Storage:                "File" (From Pool resource)
> Scheduled time:         25-Mai-2008 02:43:00
> Start time:             25-Mai-2008 02:52:55
> End time:               25-Mai-2008 02:54:25
> Elapsed time:           1 min 30 secs
> Priority:               10
> FD Files Written:       18,680
> SD Files Written:       18,680
> FD Bytes Written:       0 (0 B)
> SD Bytes Written:       2,829,813 (2.829 MB)
> Rate:                   0.0 KB/s
> Software Compression:   None
> VSS:                    no
> Storage Encryption:     no
> Volume name(s):         sys-full0237
> Volume Session Id:      458
> Volume Session Time:    1211243056
> Last Volume Bytes:      5,333,750 (5.333 MB)
> Non-fatal FD errors:    0
> SD Errors:              0
> FD termination status:  OK
> SD termination status:  OK
> Termination:            Backup OK
>
> 25-Mai 02:54 db-backup-sm-dir JobId 438: Begin pruning Jobs.
> 25-Mai 02:54 db-backup-sm-dir JobId 438: No Jobs found to prune.
> 25-Mai 02:54 db-backup-sm-dir JobId 438: Begin pruning Files.
>
>
> ===
> Bacula and client versions:
> Clients are mostly Gentoo Linux (amd64 and i686) and FreeBSD 5.x.
> Bacula dir, storage, fd version: 2.2.8 (from Gentoo portage)
>
> ===
> Here are the relevant Bacula config bits:
>
> 1. job definition
> Job {
> Name = "system xxx"
> Client = xxx-fd
> JobDefs = "xendomain-linux system"
> Write Bootstrap = "/var/lib/bacula/xxx.bsr"
> }
>
> 2. jobdef def...
> JobDefs {
> Name = "xendomain-linux system"
> Type = Backup
> Level = Full
> FileSet = "xendomain-linux system"
> Schedule = "Sys2MonthsCycle"
> Storage = File
> Messages = Standard
> Pool = sys-full
> Priority = 10
> }
>
> 3. fileset
> FileSet {
> Name = "xendomain-linux system"
> Ignore FileSet Changes = no
> Include {
>   Options {
>     wild = "/usr/src/*"
>     wild = "/var/cache/*"
>     wild = "/tmp/*"
>     wild = "/opt/*"
>     wild = "/var/log/*"
>     wild = "/var/www/*"
>     wild = "/usr/packages/*"
>     wild = "/vservers/*"
>     wild = "/proc/*"
>     wild = "/sys/*"
>     wild = "/mnt/*"
>     wild = "/usr/portage/*"
>     exclude = yes
>   }
>   Options {
>     signature = MD5
>     onefs = no
>     compression = GZIP1
>     checkfilechanges = no
>   }
>   File = /
> }
> }
>
> 4. pool
> Pool {
> Storage = File
> Name = sys-full
> Pool Type = Backup
> Recycle = yes
> AutoPrune = yes
> Volume Retention = 4 months 5 days
> Recycle Oldest Volume = yes
> Maximum Volume Jobs = 1
> Label Format = "sys-full"
> }
>
> 5. storage def
> Storage {
> Name = File
> Address = db-backup-sm                # N.B. Use a fully qualified
> name here
> SDPort = 9103
> Password = "xx"
> Device = FileStorage
> Media Type = File
> Maximum Concurrent Jobs = 20
> }
>
> 6. schedule
> Schedule {
> Name = "Sys2MonthsCycle"
> Run = SpoolData = yes FullPool=sys-full IncrementalPool=sys-inc Full
> 1st sun on may at 00:17
> }
>
>
> Hope someone can help.
> Thanks much folks.
>
> Stefan
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>