Re: [BackupPC-users] Keeping servers in sync

Hi,

Jim Wilcoxson wrote on 2009-08-31 08:08:48 -0400 [Re: [BackupPC-users] Keeping 
servers in sync]:
> [...]
> I did some reading today about BackupPC's storage layout and design.
> I haven't finished yet, but one thing stuck out:
> 
> "BackupPC_link reads the NewFileList written by BackupPC_dump and
> inspects each new file in the backup."
> 
> To speed up incrementals, HashBackup could make use the NewFileList.

BackupPC_link deletes the NewFileList upon completion. Surely BackupPC could
be changed to keep the NewFileList (as NewFileList.N for N = backup number)
instead, but it's a bit awkward, because BackupPC no longer needs this
information, and it's strange information at that (a list of all files in the
backup that were not linked to pool files but need to be). It is really only
meaningful for the communication between BackupPC_dump and BackupPC_link.
It might be helpful for incremental pool backups, maybe (but only of {c,}pool/,
not of pc/).

1.) Robustness - do you want to trust the contents of files on a file system
    and risk missing pool files in your copy, because they are for whatever
    reason not listed?

2.) Completeness - you need to account for pool chain renumbering (and
    deletion of pool files). Unless BackupPC_nighly also provides information
    on what it changed, you need to traverse the pool anyway.

3.) Which NewFileList.* files would you want to look at? Presumably those
    for all backups, for which you need to copy the pc/host/num/ tree.

4.) How do you handle trees of backups that are in progress?

> Reading the
> NewFileList might be a way to speed up an incremental backup of the
> BackupPC pool, though incremental scans are fairly quick already.

I tend to think that you would introduce dependencies (on the BackupPC
version) for an insignificant gain.

> Another thing about BackupPC is that by my reading, new files are
> first written to the PC area, then pool links are created by
> BackupPC_link.  This suggests that backing up the pool last might
> improve performance, because it is likely to be more fragmented.

I'm not sure about that. Full backups contain links to all files in the
corresponding pc/host/num/ tree, which will be to pool files, wherever on the
disk they might be. Incremental backups don't only contain files that are new
to the pool (NewFileList) but also links to existing pool files with the same
content. Again, it's impossible to predict where on the disk they might be.

> Right now, HB will backup cpool first, then pc, then pool, in that
> order.  It might be better to backup pc first, then cpool and pool.
> I'm not sure how much of a difference it would make, if any, because
> it's hard to predict disk layouts in any filesystem.

I'm sceptical that it will make any systematic difference. For one pool, it
might be significantly faster one way, for another pool, the other way. What
exactly is the speed advantage you are hoping for? Having inode information in
cache from one part to the next (i.e. {c,}pool/ vs. pc/ traversal), or reading
file content for multiple small files? Or are you thinking about the resulting
speed of your HashBackup pool?

Regards,
Holger

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/