BackupPC-users

[BackupPC-users] Routine to verify/fix/add cpool rsync checksum digests

2011-02-09 13:05:35
Subject: [BackupPC-users] Routine to verify/fix/add cpool rsync checksum digests
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: General list for user discussion <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 09 Feb 2011 13:03:08 -0500
The following code can be used to verify (recursively) the rsync
checksum digests in the cpool, pc tree or any other path you specify.

With the optional -v flag, you can *verify* all existing digests
With the optional -f flag, you can additionally *fix* invalid digests.
With the optional -a flag, you can *add* digests to files that do not
yet have them (the digest is not added until the *2nd* time the file
is backed up using a full backup).

I like the idea of occasionally running this program to verify, fix
and add md4 checksum digests to my cpool so that each file has a
*valid* md4 block and file checksum attached.

While md4 is not perfect, I imagine that the combination of Adler32
checksums, 2048byte block MD4 checksums and the full file MD4 checksum
is pretty unique. And that unless you are maliciously creating a weird
file, that the checksum will tell you with almost 100% certainty
whether your data is corrupt or not.

One could of course consider other checksums like md5 or sha256 etc.,
but since these rsync checksums come almost for free and are already
part of BackupPC, it seems like they are an under-utilized data
integrity check.

Here is the code:
---------------------------------

#!/usr/bin/perl
#========================================================================
#
# BackupPC_digestVerify.pl
#                       
#
# DESCRIPTION

#   Check contents of cpool and/or pc tree entries (or the entire
#   tree) against the stored rsync block and file checksum digests,
#   including the 2048-byte block checksums (Adler32 + md4) and the
#   full file md4sum.

#   Optionally *fix* invalid digests (using the -f flag).
#   Optionally *add* digests to compressed files that don't have a digest.

#
# AUTHOR
#   Jeff Kosowsky
#
# COPYRIGHT
#   Copyright (C) 2010, 2011  Jeff Kosowsky
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
#========================================================================
#
# Version 0.2, released February 2011
#
#========================================================================

use strict;
use warnings;
use Getopt::Std;

use lib "/usr/share/BackupPC/lib";
use BackupPC::Xfer::RsyncDigest;
use BackupPC::Lib;
use File::Find;

use constant RSYNC_CSUMSEED_CACHE     => 32761;
use constant DEFAULT_BLOCKSIZE     => 2048;

my $dotfreq=1000;
my %opts;
if ( !getopts("cCpavft:dVq", \%opts) || @ARGV !=1
         || (defined($opts{v}) + defined($opts{f}) > 1)
         || (defined($opts{c}) + defined($opts{C}) + defined($opts{p}) > 1)
         || (defined($opts{q}) + defined($opts{V}) > 1)) {
    print STDERR <<EOF;
usage: $0 [-c|-C|-p] [-v|-f] [-a][-V|-Q] [-d] [-t] [File or Directory]
  Verify Rsync digest in compressed files containing digests.
  Ignores directories and files without digests (firstbyte = 0xd7)
  Only prints if digest inconsistent with file content unless verbose flag
  Note: zero length files are skipped and not counted

  Options:
    -c   Consider path relative to cpool directory
    -C   Entry is a single cpool file name (no path)
    -p   Consider path relative to pc directory
        -v   Verify rsync digests
    -f   Verify & fix rsync digests if invalid/wrong
    -a   Add rsync digests if missing
    -t   TopDir
    -d   Print a '.' to STDERR for every $dotfreq digest checks
    -V   Verbose - print result of each check
         (default just prints result on errors/fixes/adds)
    -Q   Don\'t print results even with errors/fixes/adds


In non-quiet mode, the output consists of 3 columns.
  1. inode number
  2. return code:
       0 = digest added
       1 = digest ok
       2 = digest invalid
       3 = no digest
       <0 other error (see source)
  3. file name

EOF
exit(1);
}

#NOTE: BackupPC::Xfer::RsyncDigest->digestAdd opens fils O_RDWR so
#we should run as user backuppc!
die("BackupPC::Lib->new failed\n") if ( !(my $bpc = BackupPC::Lib->new) );
#die("BackupPC::Lib->new failed\n") if ( !(my $bpc = BackupPC::Lib->new("", "", 
"", 1)) ); #No user check

my $Topdir = $opts{t} ? $opts{t} : $bpc->TopDir();
$Topdir = $Topdir . '/';
$Topdir =~ s|//*|/|g;



my $root = '';
my $path;
if ($opts{C}) {
        $path = $bpc->MD52Path($ARGV[0], 1, "$Topdir/cpool");
        $path =~ m|(.*/)|;
        $root = $1; 
}
else {
        $root = $Topdir . "pc/" if $opts{p};
        $root = $Topdir . "cpool/" if $opts{c};
        $root =~ s|//*|/|g;
        $path = $root . $ARGV[0];
}

my $add = $opts{a};
my $verify = $opts{v};
my $fix = $opts{f};
my $verbose = $opts{V};
my $quiet = $opts{Q};
my $progress= $opts{d};

die "$0: Cannot read $path\n" unless (-r $path);

BackupPC::Xfer::RsyncDigest->logHandlerSet(\&{sub {};}); #Don't want internal 
log

my ($totfiles, $totdigfiles, $totnodigfiles) = (0, 0, 0);
my ($totbadfiles, $totfixedfiles, $totaddedfiles) = (0, 0, 0);
find(\&verify_digest, $path); 

print STDERR "\n" if $progress;
$totaddedfiles = "NA" unless $add;
$totbadfiles = "NA" unless $verify || $fix;
$totfixedfiles = "NA" unless $fix; 
printf STDERR "Totfiles:   %s\tTotNOdigests:  %s\tTotADDEDdigests: %s\n",
        $totfiles, $totnodigfiles, $totaddedfiles;
printf STDERR "Totdigests: %s\tTotBADdigests: %s\tTotFIXEDdigests: %s\n",
        $totdigfiles, $totbadfiles, $totfixedfiles;
exit;

#########################################################################################################################
sub verify_digest {
        return -200 unless (-f);
        return -201 unless -s > 0;
        my @fstat = stat(_);
        $totfiles++;

        if ($progress && !($totfiles%$dotfreq)) {
                print STDERR "."; 
                ++$|; # flush print buffer
        }

        my $action;
        #Check whether checksum is cached (i.e. first byte not 0xd7)
        if(BackupPC::Xfer::RsyncDigest->fileDigestIsCached($_)) {
                $totdigfiles++; #Digest exists
                if($fix) { #Verify & fix
                        $action = 1;
                } elsif($verify) { #Verify only
                        $action = 2;
                } else {
                        return 4; #Don't verify or fix
                }
        } else { #Missing digest
                $totnodigfiles++;
                if($add) {
                        $action = 0; #Add missing digest
                } else { #Skip over missing digest
                        $File::Find::name =~ m|$root(.*)|;
                        printf("%d %d %s\n", (stat(_))[1], 3, $1) if $verbose;
                        return -202;
                }
        }


        my $ret = BackupPC::Xfer::RsyncDigest->digestAdd($_, DEFAULT_BLOCKSIZE, 
                                                                                
                         RSYNC_CSUMSEED_CACHE, 
                                                                                
                         $action);
#Note setting blocksize=0, results in using the default blocksize of 2048 also
#but it generates an error message
#Also leave out final protocol_version input since by setting it undefined 
#we make it determine it automatically
        $totbadfiles++ unless $ret == 1 || $ret == 0;
        $totfixedfiles++ if $ret == 2 && $action == 1;
        $totaddedfiles++ if $ret == 0 && $action == 0;

        if ($verbose || ($ret!=1 && !$quiet)) {
                $File::Find::name =~ m|$root(.*)|;
                printf "%d %d %s\n", (stat(_))[1], $ret, $1;
        }
        return $ret;
}

# Return codes:
# -100: Wrong RSYNC_CSUMSEED_CACHE or zero file size
# -101: Bad/missing RsyncLib
# -102: ZIO can't open file
# -103: sysopen can't open file
# -104: sysread can't read file
# -105: Bad first byte (not 0x78, 0xd6 or 0xd7)
# -106: Can't seek to end of data portion of file (i.e. beginning of digests)
# -107: First byte not 0xd7
# -108: Error on reading digest
# -109: Can't seek when trying to position to rewrite digest data (shouldn't 
happen if only verifying)
# -110: Can't write digest data (shouldn't happen if only verifying)
# -111: Can't seek looking for extraneous data after digest (shouldn't happen 
if only verifying)
# -112: Can't truncate extraneous data after digest (shouldn't happen if only 
verifying)
# -113: If can't sysseek back to file beginning (shouldn't happen if only 
verifying)
# -114: If can't write out first byte (0xd7) (shouldn't happen if only 
verifying)
# 1: Digest verified
# 2: Digest wrong

#-200: Not a file
#-201: Zero length file
#-202: No cached checksum

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>
  • [BackupPC-users] Routine to verify/fix/add cpool rsync checksum digests, Jeffrey J. Kosowsky <=