BackupPC-users

Re: [BackupPC-users] BackupPC very slow backing up 25GB files

2011-02-03 13:51:47
Subject: Re: [BackupPC-users] BackupPC very slow backing up 25GB files
From: John Goerzen <jgoerzen AT complete DOT org>
To: backuppc-users AT lists.sourceforge DOT net
Date: Thu, 3 Feb 2011 18:49:30 +0000 (UTC)
Les Mikesell <lesmikesell <at> gmail.com> writes:

> 
> On 2/3/2011 11:33 AM, Carl Wilhelm Soderstrom wrote:
> 
> And worse for performance, on the server side it has to uncompress the 
> copy to compute the block checksums unless you have enabled checksum 
> caching and the file hasn't changed for two runs (maybe two fulls).

I don't believe that is the problem.  From watching backuppc's processes in top
on the backuppc server, yes it adds a few minutes, but 2 or 3, not 80.

Enabling a bunch of debugging on this, I noticed that rsync uses a blocksize of
65704, while backuppc uses 16384 while talking to the same client.  That is,
backuppc logged:

blkCnt=277102, blkSize=16384, remainder=12794

while rsync natively logged:

count=65716 rem=35464 blength=65704 s2length=4 flength=4317773824

The 16384 upper limit in BackupPC appears to be hardcoded in RsyncDigest.pm:

sub blockSize
{
    my($class, $fileSize, $defaultBlkSize) = @_;

    my $blkSize = int($fileSize / 10000);
    $blkSize = $defaultBlkSize if ( $blkSize < $defaultBlkSize );
    $blkSize = 16384 if ( $blkSize > 16384 );
    $blkSize += 4 if ( (($blkSize + 4) % 64) == 0 );
    return $blkSize;
}

Why that is, I don't know.  There is no comment stating why 16384 was chosen,
and given that backuppc is doing lots of stuff with checksums I am afraid to
change it without good reason.

It should also be stated here that the increased CPU load is mostly with the
*client*.  That is, BackupPC is, in some way, communicating differently with the
remote rsync process in a way that causes the remote process to use much more
CPU, and be much slower, than when the regular rsync is communicating with it.

This blocksize thing may be it, or it may be completely off target.  I don't
know, but it's the best I have so far.

> When there is a mismatch, the server side has to reconstruct a full copy 
> of the original, merging any uncompressed matching blocks with the 
> differences from the remote.  Unless you have limited bandwidth, this is 
> usually much slower than a full new copy would be.

I am not using compression on the backuppc side.

-- John


------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/