BackupPC-users

[BackupPC-users] BUG in backuppc md5sum calculation for root attrib files (WAS: cpool md5sum errors with certain attrib files)

2009-12-29 03:36:12
Subject: [BackupPC-users] BUG in backuppc md5sum calculation for root attrib files (WAS: cpool md5sum errors with certain attrib files)
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 29 Dec 2009 03:33:11 -0500
Jeffrey J. Kosowsky wrote at about 11:50:04 -0500 on Tuesday, December 22, 2009:
 > In my neuroses, I ran a perl script that recursed through the cpool
 > and checked whether the md5sum of each stored file corresponded to its
 > location in the pool (note when I say md5sum I mean the special notion
 > of md5sum defined in BackupPC::Lib.pm)
 > 
 > 1. Out of a total of 855,584 pool entries, I found a total of 35 errors.
 > 
 > 2. Interestingly, all 35 of these errors corresponded to 'attrib' files.
 > 
 > 3. Perhaps even more interestingly, all but two of the attrib files
 >    were at the top level -- i.e., $TopDir/pc/<machine>/<nnn>/attrib
 >    (this represents 33 out of a total of 87 backups)
 > 
 > 4. None of the attrib files appear corrupted when I examine them using
 >    BackupPC_attribPrint
 > 
 > So what could possibly be causing the md5sum to be wrong just on a
 > small subset of my pool files?
 > 
 > Why are these errors exclusively limited to attrib files of which
 > almost all are top-level attrib files (even though they constitute a
 > tiny fraction of total attrib files)?
 > 
 > - Disk corruption or hardware errors seem unlikely due to the specific
 >   nature of these errors and the fact that the file data itself seems
 >   intact
 > 
 > Of course, I could easily write a routine to "fix" these errors, but I
 > just don't understand what is wrong here. I suppose the errors aren't
 > particularly dangerous in that the only potential issue they could
 > cause would be some missed opportunities for pool de-duplication of
 > stored attrib files. But there shouldn't be wrong pool md5sums...
 > 

OK. I think I found a way to reproduce this.

The md5sum for the root level attrib (i.e., the attrib file at the
level of pc/machine/attrib) is wrong if:
1. There are at least 2 shares
2. The attrib entries for each of the shares has changed since the
   last backup (e.g., if the share directory has it's mtime modified

Try the following on a machine with >=2 shares
1. Touch one of the share directories (to change the mtime)
2. Run a backup
3. Run another backup immediately afterwards (or more specifically
   without changing any of the attrib entries for each of the shares)
4. Look at:
   diff machine/n/attrib machine/n+1/attrib  
                ==> no diffs
   ls -i machine/n/attrib machine/n+1/attrib 
                ==> different i-nodes
5. The *2nd* attrib is stored in the correct md5sum cpool entry; the
first one is not.

To explore this, you can use the following perl script I wrote:
In particular, try something like

BackupPC_zfile2md5 -p -k "machine/*/attrib"

(note the script is really just a nice wrapper around the routine
zFile2MD5 which is part of my jLib.pm module that can be found on the wikki)


----------------------------------------------------------------------------
#!/usr/bin/perl
#============================================================= -*-perl-*-
#
# BackupPC_zfile2md5.pl: calculate and optionally verify the BackupPC-style
#                        partial md5sum of any file compressed by BackupPC
#
# DESCRIPTION

#   This program allows you to calculate the partial md5sum
#   corresponding to the cpool path for any file that uses
#   BackupPC-style compression whether or not that file is actually
#   stored or linked to the cpool. Optionally, if the file is a cpool
#   entry or is linked to the cpool, you can add the '-k' flag to
#   verify whether the corresponding cpool path is consistent with the
#   actual md5sum of the file.
#
#   Multiple files or directories can be given on the command line,
#   allowing you to calculate (and optionally verify) the md5sum for
#   multiple files or multiple trees of files. The script also does
#   path globbing using standard shell globbing conventions.
#
#   Paths are assumed to be either absolute or relative to the current
#   directory unless, the options -C, -c, or -p are given in which
#   case the paths are understood to be a cpool file name (without
#   path), a path relative to the cpool, or a path relative to the
#   pc directory, respectively.


# AUTHOR
#   Jeff Kosowsky
#
# COPYRIGHT
#   Copyright (C) 2009  Jeff Kosowsky
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
#========================================================================
#
# Version 1.0, released Dec 2009
#
#========================================================================
use strict;
use warnings;
use Getopt::Std;

use lib "/usr/share/BackupPC/lib";
use BackupPC::FileZIO;
use BackupPC::Lib;
use BackupPC::jLib;
use Cwd 'abs_path';
use File::Find;
use File::Glob ':glob';

die("BackupPC::Lib->new failed\n") if ( !(my $bpc = BackupPC::Lib->new("", "", 
"", 1)) ); #No user check
%Conf   = $bpc->Conf(); #Global variable defined in jLib.pm (do not use 'my')

my %opts;
if ( !getopts("Ccpka", \%opts) || @ARGV < 1
         || (defined($opts{C}) + defined($opts{c}) + defined($opts{p}) > 1)) {
    print STDERR <<EOF;
usage: $0 [options] path1 [path2] [path3]....
  Find BackupPC-style md5sum of compressed file
  Options:
    -C   Entry is a cpool file name (no path)
    -c   Consider path relative to cpool directory
    -p   Consider path relative to pc directory
    -k   Compare to md5sum embedded in file name (for cpool entries)
         or to the inode number of the corresponding pool file (otherwise)
    -a   Use size from attrib file if available (for backup files)
EOF
exit(1);
}

my $useattribsize = $opts{a} ? 0 : -1;
my $TopDir = $Conf{TopDir};
my $compress = $Conf{CompressLevel};
my $pool = $compress > 0 ? "cpool" : "pool";

my $md5 = Digest::MD5->new;
my @zpathlist;
foreach (@ARGV) {
        if($opts{C}) {
                @zpathlist = (@zpathlist, bsd_glob($bpc->MD52Path($_, 
$compress)));
        } elsif($opts{c}) {
                @zpathlist = (@zpathlist, bsd_glob($bpc->TopDir() . "cpool/" . 
$_));
        } elsif($opts{p}) {
                @zpathlist = (@zpathlist, bsd_glob($bpc->TopDir() . "pc/" . 
$_));
        } else {
                @zpathlist = (@zpathlist, bsd_glob(abs_path($_)));
        }
}
die "No valid paths...\n" unless @zpathlist;
foreach my $zpath (@zpathlist) {
        unless(-e $zpath) {
                warn "'$zpath' is not an existing file or directory path...\n";
                next;
        }

        $zpath =~ s#/+#/#g; #Remove extra slashes
        $zpath =~ s#/\.(/|$)#/#g; #Remove extra /.

        find(\&check_md5, $zpath);
}

sub check_md5
{
        return unless -f;
        my $filename = $File::Find::name;
        my $digest = zFile2MD5($bpc, $md5, $File::Find::name, $useattribsize);
        return if ($digest eq "-1");

        $filename =~ s#^${TopDir}pc/## if $opts{p};
        $filename =~ s#^${TopDir}$pool/## if $opts{c};
        $filename =~ s#.*/## if $opts{C};
        print "$digest  $filename";
        if ($opts{k}) {
                if ($opts{c} || $opts{C}) {
                        $File::Find::name =~ m#(.*/)?([[:xdigit:]]+)(_\d+)?#;
                        $digest eq $2 ?  print " MATCH" : print " ERROR";
                } else {
                        my $poolpath = $bpc->MD52Path($digest,$compress);
                        (-e $poolpath && (stat($poolpath))[1] == 
(stat($File::Find::name))[1]) ?
                                print " MATCH" : print " ERROR";
                }
        }
        print "\n";
}

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>