BackupPC-users

Re: [BackupPC-users] BUG in backuppc md5sum calculation for root attrib files (WAS: cpool md5sum errors with certain attrib files)

2009-12-29 21:43:03
Subject: Re: [BackupPC-users] BUG in backuppc md5sum calculation for root attrib files (WAS: cpool md5sum errors with certain attrib files)
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 29 Dec 2009 20:40:47 -0500
Jeffrey J. Kosowsky wrote at about 03:33:11 -0500 on Tuesday, December 29, 2009:
 > Jeffrey J. Kosowsky wrote at about 11:50:04 -0500 on Tuesday, December 22, 
 > 2009:
 >  > In my neuroses, I ran a perl script that recursed through the cpool
 >  > and checked whether the md5sum of each stored file corresponded to its
 >  > location in the pool (note when I say md5sum I mean the special notion
 >  > of md5sum defined in BackupPC::Lib.pm)
 >  > 
 >  > 1. Out of a total of 855,584 pool entries, I found a total of 35 errors.
 >  > 
 >  > 2. Interestingly, all 35 of these errors corresponded to 'attrib' files.
 >  > 
 >  > 3. Perhaps even more interestingly, all but two of the attrib files
 >  >    were at the top level -- i.e., $TopDir/pc/<machine>/<nnn>/attrib
 >  >    (this represents 33 out of a total of 87 backups)
 >  > 
 >  > 4. None of the attrib files appear corrupted when I examine them using
 >  >    BackupPC_attribPrint
 >  > 
 >  > So what could possibly be causing the md5sum to be wrong just on a
 >  > small subset of my pool files?
 >  > 
 >  > Why are these errors exclusively limited to attrib files of which
 >  > almost all are top-level attrib files (even though they constitute a
 >  > tiny fraction of total attrib files)?
 >  > 
 >  > - Disk corruption or hardware errors seem unlikely due to the specific
 >  >   nature of these errors and the fact that the file data itself seems
 >  >   intact
 >  > 
 >  > Of course, I could easily write a routine to "fix" these errors, but I
 >  > just don't understand what is wrong here. I suppose the errors aren't
 >  > particularly dangerous in that the only potential issue they could
 >  > cause would be some missed opportunities for pool de-duplication of
 >  > stored attrib files. But there shouldn't be wrong pool md5sums...
 >  > 
 > 
 > OK. I think I found a way to reproduce this.
 > 
 > The md5sum for the root level attrib (i.e., the attrib file at the
 > level of pc/machine/attrib) is wrong if:
 > 1. There are at least 2 shares
 > 2. The attrib entries for each of the shares has changed since the
 >    last backup (e.g., if the share directory has it's mtime modified
 > 
 > Try the following on a machine with >=2 shares
 > 1. Touch one of the share directories (to change the mtime)
 > 2. Run a backup
 > 3. Run another backup immediately afterwards (or more specifically
 >    without changing any of the attrib entries for each of the shares)
 > 4. Look at:
 >    diff machine/n/attrib machine/n+1/attrib  
 >              ==> no diffs
 >    ls -i machine/n/attrib machine/n+1/attrib 
 >              ==> different i-nodes
 > 5. The *2nd* attrib is stored in the correct md5sum cpool entry; the
 > first one is not.
 > 

OK. I found the bug. NOTE THIS BUG AFFECTS EVERYBODY WHO IS BACKING UP
MORE THAN ONE SHARE -- i.e., everybody backing up more than one file
has such pool errors though they should not effect data integrity,
only pooling efficiency.

The problem is that after each share is backed up, the base attrib
file ($TopDir/machine/n/attrib) is written to NewFileList.
Note that the logger notes this with:
         attribWrite(dir=) -> /var/lib/BackupPC//pc/machine/new//attrib
When you have multiple shares then multiple versions of the base
attrib file appear in NewList.

Then when BackupPC_link is called, it goes through NewList in order so
it ends up linking the attrib file using the first instance of the
md5sum in the list. But this of course is the wrong md5sum since it
only includes the attrib info for the first share.

When subsequent entries for the base attrib are looped through, the
routine MakeFileLink does nothing since the following test fails:
        } elsif ( $newFile && -f $name && (stat($name))[3] == 1 ) {

This all explains why:
1. The error only occurs when backing up more than one share
2. If there are m shares, then the error only occurs on the first
   (m-1) backups where there have been no changes to the share
   directories. That is because on each subsequent backup, one fewer
   intermediate base attrib is added to NewFileList since each backup
   adds an intermediate version to the cpool. So finally, after 'm-1'
   only the actual final base attrib is added to NewFileList and can
   then be linked properly to the cpool.

Here are some thoughts on approaches to fixing the bug:

1. You could 'reverse' the iteration through the Newfile
   list but this may not be easy since currently it is read in one-by-one
   on-the-fly as follows:

    if ( open(NEW, "<", "$Dir/NewFileList.$Backups[$num]{num}") ) {
        binmode(NEW);
        while ( <NEW> ) {
            chomp;
            next if ( !/(\w+) (\d+) (.*)/ );
            LinkNewFile($1, $2, "$CurrDumpDir/$3");
        }
        close(NEW);
    }

2. An alternative approach would be to eliminate the test
   (stat($name))[3] == 1 )
   noted above which would allow later links to overwrite earlier
   ones. However, this may have other unforeseen ill effects

3. A third approach would be to only write the base attrib after all
   shares have been written (or just before an abort if not all shares
   are written).

4. Maybe the best approach would be to modify the loop so it skips
   over all instances of the base attrib file until the last one.

   One could use the following code:

    if ( open(NEW, "<", "$Dir/NewFileList.$Backups[$num]{num}") ) {
        binmode(NEW);
+               my @shareattrib;
        while ( <NEW> ) {
            chomp;
            next if ( !/(\w+) (\d+) (.*)/ );
+                       if($3 eq "attrib") {
+                                 @shareattrib=($1, $2, $3);
+                   } else {
                               LinkNewFile($1, $2, "$CurrDumpDir/$3");
+                       }
        }
+       LinkNewFile(@shareattrib) if @shareattrib;
        close(NEW);

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>