BackupPC-users

Re: [BackupPC-users] Matching files against the pool remotely.

2009-12-18 11:15:37
Subject: Re: [BackupPC-users] Matching files against the pool remotely.
From: Les Mikesell <lesmikesell AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 18 Dec 2009 10:12:10 -0600
Malik Recoing. wrote:
>
>>> I know a file will be skiped if it is present in the previous backup, but
>>> what
>>> appens if the file have been backed up for another host ?
>> It is required to be uploaded first as otherwise there's nothing to
>> compare it to (yeah, I know, that's a pain[1]).
>>
>> It might theoretically be sufficient to let the remote side calculate a
>> hash and compare it against the files in the pool with matching hashes,
>> and then let rsync do full compares against all the matching hashes in the
>> pool (since hash collisions happen), but I don't believe anyone has tried
>> to code this up yet, and it would only be of limited uses in systems that
>> were network bandwidth constrained rather than disk bandwidth constrained.
> 
> I'm quite sure it will be an improvement for both. Globaly there will be no
> overhead. More : the hash calculation will be kind of "clustered" delegating 
> it
> to the client. The matching of identical hash is anyway done by BackupPC_Link.
> Thus BackupPC_Link will became pointless in a "rsync-only" configuration. The
> disk and the network trafic will be reduced as many files won't be transfered 
> at
> all.

There are two problems: one is that the remote agent is a standard rsync 
binary that knows nothing about backuppc's hashes; the other is that 
hash collisions are normal and expected - and disambiguated by a full 
data comparison.

> I tougth of a similar solution. When your client are mostly "full system tree"
> backups, you may have ready-to-copy backups of the differents OS tree. When a
> new client is added, you copy the corresponding OS directory as it was the 
> first
> full backup.

Yes, if your remote machines are essentially clones of each other, you 
could create their pc directories as clones with a tool that knows how 
to make a tree of hardlinks.

A better solution might be to have a local machine at the site running 
backuppc and work out some way to get an offsite copy.  If bandwidth is 
such an issue, you are also going to have trouble doing a restore.  But, 
if you've followed this mail list very long you'd know that the 'offsite 
copy' problem doesn't have a good solution yet either.

-- 
   Les Mikesell
    lesmikesell AT gmail DOT com



------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/