BackupPC-users

Re: [BackupPC-users] Cross-platform bug/inconsistency in rsync digests (File::RsyncP::digest) - PROBLEM IS IN ADLER32!!!

2011-02-09 08:15:53
Subject: Re: [BackupPC-users] Cross-platform bug/inconsistency in rsync digests (File::RsyncP::digest) - PROBLEM IS IN ADLER32!!!
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: Craig Barratt <cbarratt AT users.sourceforge DOT net>
Date: Wed, 09 Feb 2011 08:13:13 -0500
Craig Barratt wrote at about 22:50:31 -0800 on Tuesday, February 8, 2011:
 > Unfortunately this is a bug inherited from an earlier version of
 > rsync. The C language char data type can be signed or unsigned -
 > it is implementation dependent.  On ARM it is unsigned, and on x86
 > it is signed.
 > 
 > If you look in get_checksum1() from checksum.c in a recent rsync
 > you will see it typecasts buf to be signed char*.  I've edited
 > the code below with the fix (change buf to buf1 in the function
 > declaration and add a new declaration for buf):
 > 
 > Craig
 > 
 > > /*
 > >  * a simple 32 bit checksum that can be updated from either end
 > >  * (inspired by Mark Adler's Adler-32 checksum)
 > >  */
 > > UINT4 adler32_checksum(char *buf1, int len)
 > > {
 > >     int i;
 > >     UINT4 s1, s2;
 > >     signed char *buf = (signed char*)buf1;
 > > 
 > >     s1 = s2 = 0;
 > >     for ( i = 0 ; i < len - 4; i += 4 ) {
 > >         s2 += 4 * (s1 + buf[i]) + 3 * buf[i+1] + 2 * buf[i+2] + buf[i+3];
 > >         s1 += (buf[i+0] + buf[i+1] + buf[i+2] + buf[i+3];
 > >     }
 > >     for ( ; i < len ; i++ ) {
 > >         s1 += buf[i];
 > >            s2 += s1;
 > >     }
 > >     return (s1 & 0xffff) + (s2 << 16);
 > > }

THANKS CRAIG - that fixes it.
Some follow-ups:

1. Any reason not to fix this in general in the next version of
   File::RsyncP?
   Presumably the change you made would work on all architectures or
   one could use an #ifdef to determine whether to include the change.

2. Does this bug affect the speed of my BackupPC backups on an arm
   architecture? i.e., will the fact that the adler32 checksums are
   wrong cause rsync to spend a lot of time matching up blocks and
   possibly re-transmitting *or* is this problem just restricted to
   the checksums stored in the pool and doesn't interfere with the
   actual transfer/matching algorithm?

3. More generally, how if at all did this affect the integrity or
   speed of my backups on an arm architecture?

4. Given that you set CHAR_OFFSET=0, wouldn't the following be
   simpler and at least as fast:

    for (i=0 ; i < len ; i++ ) {
          s1 += buf[i];
          s2 += s1;
    }


        versus the original:

    for ( i = 0 ; i < len - 4; i += 4 ) {
        s2 += 4 * (s1 + buf[i]) + 3 * buf[i+1] + 2 * buf[i+2] + buf[i+3] +
              10 * CHAR_OFFSET;
        s1 += (buf[i+0] + buf[i+1] + buf[i+2] + buf[i+3] + 4 * CHAR_OFFSET);
    }
    for ( ; i < len ; i++ ) {
        s1 += (buf[i] + CHAR_OFFSET);
        s2 += s1;
    }


5. Finally, I know that I am not the only one running BackupPC on arm
   architectures (e.g., most of the SOHO NAS devices are arm). It is
   interesting that there are errors both in the pool md5sum names
   (per my previous thread unless using latest patched Digest::MD5)
   and now in the rsync checksums. Presumably the backup integrity is
   not affected but interesting that nobody else had run into these
   errors.


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/