BackupPC-users

Re: [BackupPC-users] Wrong blocksize (!=2048) in rsync checksums for some files

2011-02-18 01:29:31
Subject: Re: [BackupPC-users] Wrong blocksize (!=2048) in rsync checksums for some files
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 18 Feb 2011 01:27:15 -0500
Jeffrey J. Kosowsky wrote at about 16:23:49 -0500 on Thursday, February 17, 
2011:
 > Jeffrey J. Kosowsky wrote at about 15:41:26 -0500 on Thursday, February 17, 
 > 2011:
 >  > I have been running my BackupPC_digestVerify.pl program to check the
 >  > rsync digests in my pool.
 >  > 
 >  > Looking through the 1/x/x/ tree, I found 3 new bad digests out of
 >  > about 36000 when using the default blocksize of 2048.
 >  > 
 >  > It turns out that those 3 digests have a blocksize !=2048 -- and
 >  > indeed the digests do verify if you use that blocksize.
 >  > These files have block size 2327 and 9906 (twice).
 >  > Note the file sizes are 99MB, 11MB, and 16MB.
 >  > 
 >  > This seems *weird* and *wrong* since I thought the blocksize was fixed
 >  > to 2048 according to the (default) parameters passed to rsync in the
 >  > config.pl file. Specifically,
 >  >       '--block-size=2048',
 >  > 
 >  > Any idea why rsync may be ignoring this and using a larger blocksize
 >  > for these files?
 > 
 > OK this is weird... the block size used is the *uncompressed* file
 > size divided by 10,000 (rounded to integer). 
 > 
 > This too is weird since the normal rsync algorithm uses the rounded
 > sqrt of the (uncompressed) file length for the blocksize (as long as
 > it is >700 and < MAX_BLOCK_SIZE which I think may be 16,384).
 > 
 > So what is going on here and why is rsync neither using the
 > --block-size=2048 value nor the heuristic sqrt(filesize) number?
 > 

OK - I see some code in RsyncDigest.pm that seems to set the
blocksize to:
                  defaultBlksize   if filesize/10000 < defaultBlkSize
                  filesize/10000
                  16384            if filesize/10000 > 16384
were it seems that defaultBlkSize = 700

Not sure why filesize/10000 is chosen though rather than
sqrt(filesize) as per the regular rsync algorithm heuristic.

Also, I'm confused about how this reconciles with the rsync parameter
that would seemingly force the block size to 2048. And indeed nearly
all the cpool files do have a blocksize of 2048.

Now since the appended rsync digest doesn't record the blocksize (only
the number of blocks), how does BackupPC on the next round know
whether the blocksize is 2048 or the one set by the above
heuristic. And if BackupPC does not know which then it would seem that
the rsync checksum is not going to be helpful.

In particular, if rsync is given the rsync arg of --blocksize=2048,
then won't cpool files with blocksize != 2048 cause rsync to waste
time trying to align blocks based on incompatible block sizes?

So, either I am missing something here (very likely) or something is
broken...

And again, this blocksize != 2048 seems to only affect a *small*
fraction of all the files with a rsync digest (maybe about 1-2 per
1000 files with digests)

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/