BackupPC-users

Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-11 10:11:54
Subject: Re: [BackupPC-users] backup the backuppc pool with bacula
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>, Les Mikesell <lesmikesell AT gmail DOT com>
Date: Thu, 11 Jun 2009 10:02:03 -0400
Holger Parplies wrote at about 14:31:02 +0200 on Thursday, June 11, 2009:
 > Hi,
 > 
 > Jeffrey J. Kosowsky wrote on 2009-06-11 00:25:37 -0400 [Re: [BackupPC-users] 
 > backup the backuppc pool with bacula]:
 > > Holger Parplies wrote at about 04:22:03 +0200 on Thursday, June 11, 2009:
 > >  > Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] 
 > > backup the backuppc pool with bacula]:
 > >  > [...]
 > >  > the file list [...] can and has been [optimized] in 3.0 (probably 
 > > meaning
 > >  > protocol version 30, i.e. rsync 3.x on both sides).
 > > 
 > > Holger, I may be wrong here, but I think that you get the more
 > > efficient memory usage as long as both client & server are version >=3.0 
 > > even if protocol version is set to < 30 (which is true for BackupPC
 > > where it defaults back to version 28). 
 > 
 > firstly, it's *not* true. BackupPC (as client side rsync) is not
 > version >= 3.0. It's not even really rsync at all, and I doubt File::RsyncP
 > is more memory efficient than rsync, even if the core code is in C and copied
 > from rsync.
 > 
I had (perhaps mistakenly) assumed that BackupPC still used rsync
since at least in the Fedora installation, the rpm requires rsync.

Still, I believe you do get at least some of the advantages of rsync
>=3.0 when you have it on the client side at least for the rsyncd
method. In fact, this might explain the following situation:
        rsync 2.x and rsync method: Backups hang on certain files
        rsync 3.x and rsync method: Backups hang on certain files
        rsync 3.x and rsyncd method: Backups always work

Perhaps the combination of rsyncd and rsync 3.x on the client is what
allows taking advantage of some of the benefits of version 3.x.

 > Secondly, I'm *guessing* that for an incremental file list you'd need a
 > protocol modification. I understand it that instead of one big file list
 > comparison done before transfer, 3.0 does partial file list comparisons 
 > during
 > transfer (otherwise it would need to traverse the file tree at least twice,
 > which is something you'd normally avoid). That would clearly require a
 > protocol change, wouldn't it?

Maybe not if using rsyncd makes the server into the "master" so that
it controls the file listing. Stepping back, I think it all depends on
what you define as "protocol" - if protocol is more about recognized
commands and encoding, then the ordering of file listing may not be
part of the protocol but instead might be more part of the control
structure which could be protocol independent if the control is ceded
to the "master" side -- i.e., at least some changes to the control
structure could be made without having to coordinate the change with
"master" and "slave". I'm just speculating because there isn't much
documentation that I have been able to find.

 > 
 > Actually, I would think that rsync < 3.0 *does* need to traverse the file 
 > tree
 > twice, so the change might even have been made because of the wish to speed 
 > up
 > the transfer rather than to decrease the file list size (it does both, of
 > course, as well as better utilize network bandwidth by starting the transfer
 > earlier and allowing more parallelism between network I/O and disk I/O -
 > presuming my assumptions are correct).
 > 
 > > But I'm not an expert and my understanding is that the protocols themselves
 > > are not well documented other than looking through the source code.
 > 
 > Neither am I. I admit that I haven't even looked for documentation (or at the
 > source code). It just seems logical to implement it that way.
 > 
 > I can't rule out that the optimization could be possible with the older
 > protocol versions, but then, why wouldn't rsync have always operated that 
 > way?

You could say the same thing about why wasn't the protocol always that
way ;)

 > 
 > >  > > > and how the rest of the community deals with getting pools of
 > >  > > > 100+GB offsite in less than a week of transfer time.
 > >  > > 
 > >  > > 100 Gigs might be feasible - it depends more on the file sizes and 
 > > how 
 > >  > > many directory entries you have, though.  And you might have to make 
 > > the 
 > >  > > first copy on-site so subsequently you only have to transfer the 
 > > changes.
 > >  > 
 > >  > Does anyone actually have experience with rsyncing an existing pool to 
 > > an
 > >  > existing copy (as in: verification of obtaining a correct result)? I'm 
 > > kind of
 > >  > sceptical that pool chain renumbering will be handled correctly. At 
 > > least, it
 > >  > seems extremely complicated to get right.
 > > 
 > > Why wouldn't rsync -H handle this correctly? 
 > 
 > I'm not saying it doesn't. I'm saying it's complicated. I'm asking whether
 > anyone has actually verified that it does. I'm asking because it's an
 > extremely rare corner case that the developers may not have had in mind and
 > thus may not have tested. The massive usage of hardlinks in a BackupPC pool
 > clearly is something they did not anticipate (or, at least, feel the need to
 > implement a solution for). There might be problems that appear only in
 > conjunction with massive counts of inodes with nlinks > 1.

I have tested backing up a 100GB pool with nlinks >> 1 and it seemed
to work ;) Also, given that there is a well-established -H option and
that rsync is pretty solid and fundamental, it would be hard to
imagine that rsync hasn't been exhaustively tested by now in cases
where there are at least thousands of hard links and cases where
nlinks is at least several dozen or more (I have long rsyncd my root
directory and I know that even a basic Linux install has many hundreds
of hard links). And there is no reason to think that the developers
would write software that would work for nlinks = N but fail for
nlinks = N+1 (for some value of N).

Now that doesn't mean it *couldn't* happen and it doesn't mean we
shouldn't always be paranoid and test, test, test... but I just don't
have any good reason to think it would fail algorithmically. Now that
doesn't mean it couldn't slow down dramatically or run out of memory
as some have claimed, it just seems unlikely (to me) that it would
complete without error yet still have some hidden error.

 > 
 > In another thread, an issue was described that *could* have been caused by
 > this *not* working as expected (maybe crashing rather than doing something
 > wrong, not sure). It's unclear at the moment, and I'd like to be able to rule
 > it out on the basis of something more than "it should work, so it probably
 > does".
 > 
 > I'm also saying that pool backups are important enough to verify the contents
 > by looking closely at the corner cases we are aware of.

Agreed - one should be paranoid. My only claim is that it is unlikely
and would be *big* news if you could find an example of rsync
completing without logged errors yet having made a faulty copy. But by
all means test, test, test

 > 
 > > And the renumbering will change the timestamps which should alert rsync to
 > > all the changes even without the --checksum flag.
 > 
 > This part I'm not sure on. Is it actually *guaranteed* that a rename(2) must
 > be implemented in terms of unlink(2) and link(2) (but atomically), i.e. that
 > it must modify the inode change time? The inode is not really changed, except
 > for the side effect of (atomically) decrementing and re-incrementing the link
 > count. By virtue of the operation being atomical, the link count is
 > *guaranteed* not to change, so I, were I to implement a file system, would
 > feel free to optimize the inode change away (or simply not implement it in
 > terms of unlink() and link()), unless it is documented somewhere that 
 > updating
 > the inode change time is mandatory (though it really is *not* an inode 
 > change,
 > so I don't see why it should be).
 > 

Good catch!!! I hadn't realized that this was implementation
dependent. It seems that most Unix implementations (including BSD)
have historically changed the ctime, however, Linux (at least
ext2/ext3) does not at least as of kernel 2.6.26.6

In fact, the POSIS/SUS specifications specifically states:
   Some implementations mark for update the st_ctime field of renamed
   files and some do not. Applications which make use of the st_ctime
   field may behave differently with respect to renamed files unless they
   are designed to allow for either behavior.

However, it wouldn't be hard to add a "touch" to the chain renumbering
routine if you want to be able to identify newly renumbered files. One
would need to make sure that this doesn't have other unintended side
effects but I don't think that BackupPC otherwise uses the file mtime.

 > Does rsync even act on the inode change time? 
No it doesn't. In fact, I have read that most linux systems don't allow
you to set the ctime to anything other than the current system time.

 > File modification time will be
 > unchanged, obviously. rsync's focus is on the file contents and optionally
 > keeping the attributes in sync (as far as it can). ctime is an indication 
 > that
 > attributes have been changed (which may mask a content change), but 
 > attributes
 > are compared "in full" anyway (if requested), aren't they?
 > 
 > Either way, if rsync is aware of the change, it will work (rsync should 
 > simply
 > need to delete the target and re-link according to its inode map, just as if
 > the link had not been there in the first place). If not, rsync would need to
 > keep and check a mapping {source inode number -> dest inode number} (for all
 > files with nlinks > 1) to find out if all links still reference the same 
 > inode.
 > That is a closer examination than is done for single link files without
 > --checksum, and a rather expensive one. I'm not saying this doesn't happen. I
 > didn't check the source code. It would make sense to make '-H' add this 
 > check.
 > 
 > > Or are you saying it would be difficult to do this manually with a
 > > special purpose algorithm that tries to just track changes to the pool
 > > and pc files?
 > 
 > I haven't given that topic much thought. The advantage in a special purpose
 > algorithm is that we can make assumptions about the data we are dealing with.
 > We shouldn't do this unnecessarily, but if it has notable advantages, then 
 > why
 > not? "Difficult" isn't really a point. The question is whether it can be done
 > efficiently.

I meant more "difficult" in terms of being sure to track all special
cases and that one would have to be careful, not that one shouldn't do
it.

Personally, I don't like the idea of chain collisions and would have
preferred using full file md5sums which as I have mentioned earlier
would not be very costly at least for the rsync/rsyncd transfer
methods under protocol 30.

Even if full file md5sums are assumed to be too costly, I would have
preferred that in the case of collisions, that the suffix would be a
random *large* number so that renumbering would never be necessary
(where large = large enough that probability of a suffix collision is
vanishingly small). This would eliminate the need for BackupPC nightly
to have to renumber chains and would ensure that once a pool entry is
made, it would never need to change its name. To make this work, you
would need to have a suffix even for the base entry.

This would then simplify tracking incremental changes to the
pool. Basically, you would know that any new pool entry is a new file
so only those would need to be transferred incrementally. Conversely,
any newly missing pool entry is a deleted pool file to be deleted
incrementally.

Similarly, incremental backups of the pc tree would only need to look
at backups since the last incremental (assuming that you haven't done
anything to add/delete individual files from past backups). So you
would only need to run through these new backups directories and match
these inodes against the inode list you have constructed against the
pool to figure out the hard links.

Specifically, assuming you are doing an incremental backup and have
saved an earlier version of the pool database mapping inodes to pool
file names, the procedure would go as follows:

1. rsync just the pool (without a -H)
2. Use file mod times to identify new pool entries and add to the pool
   database mapping file (note the beauty is you don't have to delete
   old obsolete entries now since they are only clutter but no longer
   relevant). Resort the database.
3. Recurse through the pc directory trees of all new backups since the
   last one.
                - Copy over files with nlinks=1
                - For files with nlinks >1, reference against the updated pool
                  database map to determine the appropriate hard link mapping
                  on the destination filesystem.

Offline, you can go back and remove "old" database entries referring
to pool files that have been deleted. (This would be analogous to a
BackupPC_nightly operation where you prune away unnecessary entries
corresponding to deleted pool files)

This should in general be much more efficient than a complete rsync -H

 > 
 > > More generally, I think we really need to find a guinea pig to spend
 > > some time testing the methods that you and I have discussed of
 > > creating a sorted inode database of the pool.
 > 
 > Yes, and we need to think about how to *verify* such a copy. A verification
 > tool would also answer my question above. The algorithm for creating the
 > initial copy is not complex, so testing some sample cases might be 
 > sufficient.
 > I expect incremental updates to make the situation far more difficult. It
 > could be difficult to even imagine which cases could go wrong, so it would be
 > nice to have a tool that fully verifies that content and hardlink 
 > relationships
 > in a pool copy match the original.

100% agreed!

 > 
 > > Then it would be
 > > instructive to compare execution times vs the straight rsync -H method
 > > and vs. the tar method. For small pools, I imagine rsync -H would be
 > > faster, but at some point the database would presumably be
 > > faster. Presumably the tar method would be slowest of all. The devil
 > > of course is in the details.
 > 
 > I agree. But the important point is scalability rather than speed. We need
 > something that will continue to work regardless of pool size. You can still
 > use rsync on small pools and switch at an appropriate time (i.e. before a
 > failing rsync update breaks your copy, even if the "database version", as you
 > call it, is still somewhat slower).
 > 
 > > Either way, this issue seem to be becoming a true FAQ for this list --
 > 
 > Always has been ;-).
 > 
 > > so we should probably agree on some definitive answer (or set of
 > > answers) so that we can put this one to rest.
 > 
 > Definitely. Somehow I still see people giving different answers and 
 > restarting
 > the discussion all over again ;-).
 > 
 > > My personal belief is that while disk images or ZFS may be the "ideal"
 > > answer, there still is a need for an alternative even if slower method
 > > for reliably backing up (and ideally incrementally synching) just
 > > $topdir for those who don't/can't back up the whole partition or who
 > > can't run ZFS. My understanding is that the simple answer of "rsync -H"
 > > seems to not be reliable enough on large pools at least for some
 > > people.
 > 
 > In addition, there are cases where the "copy" is to be stored on something
 > that doesn't support hardlinks. As long as the "copy" doesn't have to be
 > functional (but rather allow re-creating a functional pool), that is no
 > problem. It is not difficult to accomodate for this case - at least for the
 > initial copy - if we have it in mind from the start. We just need to split
 > up the copy operation into a "send" and a "receive" part (like 'tar -c' and
 > 'tar -x') which can be plugged together for a straight copy or generate an
 > easily storable intermediate result. Incrementals might be harder, but we
 > should at least look into it.
 > 
 > Furthermore, I'd like to keep pool merging in mind. If we had a way to copy a
 > pool into a pre-existing *different* pool, that would be great. And it really
 > doesn't seem hard either, if we use PoolWrite() instead of File::Copy (well,
 > there might be some details to figure out, and it might be easier to make use
 > of the already known BackupPC hash and simply handle collisions like
 > PoolWrite() would). It may completely conflict with incremental updates
 > though. Or incrementals might be new pc/ file trees (based on timestamps) 
 > that
 > are merged into a pre-existing pool copy? Hmm ... there's potential there.
 > Generate a list of pool files and *some* pc/ directories, based on timestamp,
 > instead of attempting to handle the whole structure. That would miss changes
 > of existing backups (like deleting individual files), but BackupPC doesn't
 > really endorse changes of existing backups, does it? ;-)

Why do I think that the wink was directed at me? ;)

And yes, I see what you are getting at and in fact it is analogous to
what I was writing above for incremental backups if you could avoid
having chain renumbering (either by using full file md5sums or by
using a unique random suffix). In that case you would just have to look at
pool additions/deletions and to new pc backup trees.

 > 
 > Regards,
 > Holger
 > 
 > P.S.: I won't find any more time until at least Sunday, so please excuse me
 >       for not responding until then.
 > 

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/