BackupPC-users

Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-11 11:22:41
Subject: Re: [BackupPC-users] backup the backuppc pool with bacula
From: Les Mikesell <les AT futuresource DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 11 Jun 2009 10:15:58 -0500
Jeffrey J. Kosowsky wrote:
>  
> Now that doesn't mean it *couldn't* happen and it doesn't mean we
> shouldn't always be paranoid and test, test, test... but I just don't
> have any good reason to think it would fail algorithmically. Now that
> doesn't mean it couldn't slow down dramatically or run out of memory
> as some have claimed, it just seems unlikely (to me) that it would
> complete without error yet still have some hidden error.

Even if everything is done right it would depend on the source directory 
not changing link targets during the (likely long) transfer process. 
Consider what would happen if a collision chain fixup happens and 
renames pool files after rsync reads the directory list and makes the 
inode mapping table but before the transfers start.

>  > 
>  > > And the renumbering will change the timestamps which should alert rsync 
> to
>  > > all the changes even without the --checksum flag.
>  > 
>  > This part I'm not sure on. Is it actually *guaranteed* that a rename(2) 
> must
>  > be implemented in terms of unlink(2) and link(2) (but atomically), i.e. 
> that
>  > it must modify the inode change time? The inode is not really changed, 
> except
>  > for the side effect of (atomically) decrementing and re-incrementing the 
> link
>  > count. By virtue of the operation being atomical, the link count is
>  > *guaranteed* not to change, so I, were I to implement a file system, would
>  > feel free to optimize the inode change away (or simply not implement it in
>  > terms of unlink() and link()), unless it is documented somewhere that 
> updating
>  > the inode change time is mandatory (though it really is *not* an inode 
> change,
>  > so I don't see why it should be).
>  > 
> 
> Good catch!!! I hadn't realized that this was implementation
> dependent. It seems that most Unix implementations (including BSD)
> have historically changed the ctime, however, Linux (at least
> ext2/ext3) does not at least as of kernel 2.6.26.6

I sort of recall some arguments about this in the early reiserfs days. 
I guess the "cheat and short-circuit" side won even though it makes it 
impossible to do a correct incremental backup as expected with any 
ordinary tool (rsync still can but it needs a previous copy and a full 
block checksum comparison).

> In fact, the POSIS/SUS specifications specifically states:
>    Some implementations mark for update the st_ctime field of renamed
>    files and some do not. Applications which make use of the st_ctime
>    field may behave differently with respect to renamed files unless they
>    are designed to allow for either behavior.
> 
> However, it wouldn't be hard to add a "touch" to the chain renumbering
> routine if you want to be able to identify newly renumbered files. One
> would need to make sure that this doesn't have other unintended side
> effects but I don't think that BackupPC otherwise uses the file mtime.

Or, just do the explicit link/unlink operations to force the filesystem 
to do the right thing with ctime().

>  > Does rsync even act on the inode change time? 
> No it doesn't. In fact, I have read that most linux systems don't allow
> you to set the ctime to anything other than the current system time.

You shouldn't be able to.  But backup-type operations should be able to 
use it to identify moved files in incrementals.

>  > > Or are you saying it would be difficult to do this manually with a
>  > > special purpose algorithm that tries to just track changes to the pool
>  > > and pc files?
>  > 
>  > I haven't given that topic much thought. The advantage in a special purpose
>  > algorithm is that we can make assumptions about the data we are dealing 
> with.
>  > We shouldn't do this unnecessarily, but if it has notable advantages, then 
> why
>  > not? "Difficult" isn't really a point. The question is whether it can be 
> done
>  > efficiently.
> 
> I meant more "difficult" in terms of being sure to track all special
> cases and that one would have to be careful, not that one shouldn't do
> it.
> 
> Personally, I don't like the idea of chain collisions and would have
> preferred using full file md5sums which as I have mentioned earlier
> would not be very costly at least for the rsync/rsyncd transfer
> methods under protocol 30.

And I'd like a quick/cheap way so you could just ignore the pool during 
a copy and rebuild it the same way it was built in the first place 
without thinking twice.  And maybe do things like backing up other 
instances of backuppc archives ignoring their pools and merging them so 
you could restore individual files directly.

-- 
   Les Mikesell
    lesmikesell AT gmail DOT com


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/