BackupPC-users

Re: [BackupPC-users] Newbie setup questions

2011-03-11 17:10:28
Subject: Re: [BackupPC-users] Newbie setup questions
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 11 Mar 2011 17:07:50 -0500
hansbkk AT gmail DOT com wrote at about 04:04:04 +0700 on Saturday, March 12, 
2011:
 > On Fri, Mar 11, 2011 at 9:05 PM, Les Mikesell <lesmikesell AT gmail DOT com> 
 > wrote:
 > > It is the number of files with more than one link that matter, not so much 
 > > the
 > > total size.  But the newer rsync that doesn't need the whole file tree 
 > > loaded at
 > > once besides the link table and lots of RAM may permit it to scale up more.
 > 
 > OK, so TOPDIR is the proper name for the hardlinked filesystem (as
 > opposed to the pool), and it's usually /var/lib/backuppc/, correct?
 > 
 > And it's this filesystem that is the one that can be a problem, correct?
 > 
 > It would be great if we could have a standard set of metrics to be
 > able to compare our filesystems, since what might be a "huge number"
 > to one person is likely to be a tiny fraction of someone else's.
 > 
 > So looking for advice from one more Linux-knowledgeable than I on what
 > stats to collect and how to best collect them.
 > 
 >   For example "df -i /var/lib/backuppc/" will show the total number of
 > inodes, correct?
 > 
 >   And "find /var/lib/backuppc/ -type f -links +1" would show the total
 > number of files that have more than one hard link, correct?
 > 
 >   Would these two metrics be sufficient to allow for objective
 > comparisons of the filesystems?

The rsync handling of hard links is not documented (other than via the
code itself and some scattered comments in the developer list) so it's
not completely clear to me how it works.

In particular with regard to metrics you seek, I don't know whether it
is better/worse to have one file with 2N links or N files with 2
links. Your metrics don't distinguish that and depending on how the
list of hard links is constructed that may or may not be a big
difference. Specifically, in the 1st case, does the link list still
have O(N) entries or just O(1) entries -- huge difference potentially.

More generally, I'm really wondering whether perhaps rsync could be
patched/modified to work better in edge cases like BackupPC archives
where there are a huge number of hard links (both total and as a
percent of all the files). In particular, I wonder whether the list
could be presorted in a way to expedite lookup.

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/