BackupPC-users

Re: [BackupPC-users] Why does backuppc transfer files already in the pool

2010-08-28 13:47:03
Subject: Re: [BackupPC-users] Why does backuppc transfer files already in the pool
From: martin f krafft <madduck AT madduck DOT net>
To: backuppc users list <backuppc-users AT lists.sourceforge DOT net>
Date: Sat, 28 Aug 2010 19:44:53 +0200
also sprach martin f krafft <madduck AT madduck DOT net> [2010.08.28.1854 
+0200]:
> Using lsof, I found that the BackupPC_dump process actually has the
> corresponding pool file up for reading, so it has identified it.
> 
> This makes me wonder even more why the client still transfers the
> whole file. Shouldn't BackupPC_dump terminate the transfer and
> procede to the next file instead?

I can confirm a few things, after using strace and lsof on both
sides of the transfer. This is about the client sending a file that
is already in the pool:

a. The BackupPC_dump process does not write to disk if the file is
   already in the pool.

b. The BackupPC_dump process quickly identifies the corresponding
   file in ./cpool/ after the client started sending the file.

c. The client still sends the entire file, and the BackupPC_dump
   process reads it all, no idea where it puts it.

Something is going wrong.

I think that one of two things should happen instead:

1. If the dump process has access to the following information: (a)
   checksum of the 1st and last/8th 128k block of the file, (b) the
   size of the client's file, and it considered those data reliable
   enough to identify an existing file, it should terminate the
   transfer and move on.

2. Assuming that the two 128k block checksums and the file size are
   not collision-free (they probably aren't), backuppc should really
   uncompress the pool file and employ rsync's rolling checksum to
   update the file (in memory). If there were any changes, then it
   should write out the NewFile to disk; in the absence of changes,
   it should create the hardlink.

After writing this, it seems to me that (2.) is what's currently
happening. Can anyone confirm this?

Are size + 2×128k checksums not enough to identify a pool file?

Can rsyncp somehow ask the remote rsync process for the checksum of
the complete file? It could do that after it identified a matching
pool file as a preemptive check whether it would be safe to skip the
rest of the transfer.

Cheers,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"a kiss may ruin a human life."
                                                        -- oscar wilde
 
spamtraps: madduck.bogus AT madduck DOT net

Attachment: digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/