BackupPC-users

[BackupPC-users] Script for checking & fixing missing/duplicated/broken links in cpool/pool and pc backups

2008-11-09 18:47:00
Subject: [BackupPC-users] Script for checking & fixing missing/duplicated/broken links in cpool/pool and pc backups
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: General list for user discussion <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 09 Nov 2008 18:43:24 -0500
As a result of my saga with nfs problems causing broken links, I wrote
the following script for checking and fixing links 

(NOTE: the problem turned out to be an interaction between nfs and
ext3 on the Linux 2.6.19 that my NAS runs. Seems to be a problem with
how directories are cached. Everything works fine with ext2. I only
wish I knew how to get ext3 fixed for the old version of Linux that I
am stuck with)

Anyway, the script (which has several options) basically does the
following:
1. Crawls through pool and cpool looking for duplicated entries (both
   at the compressed and uncompressed level). It also tags holes in any
   of the identical md5sum chains.

2. Then crawls through the pc backups (or any subset thereof) looking
   for:
   - Links to the duplicate pool entries (so they can be moved)
   - Files with only 1 link (i.e. no link to pool) -- note some
         only need to be linked to an existing pool entry, others
         require creating a new pool entry

3. "Fix" the above problems by unlinking/creating the appropriate
   links. Note all changes are made to the pc directory (for
   safety). BackupPC_nightly is then run to clean up the pool.

Most of the script involves multiple error checks and logging
since I was paranoid about messing things up. The logging should allow
you to undo most errors if something goes wrong.

I haven't tested it extensively, except in my situation where I had
25,000 duplicate links and about 70,000 total backup files needing
fixing -- and it works fine. I also tested it on as many small test
cases as I could imagine, but I'm sure I could have left something
out.

It did though force me to learn a lot about pool md5sums,
md4 checksums, attrib file structure, ZIO, etc. -- so I at least
learned something from it -- and my system is now clean again.

There are also some (hopefully helpful) simple subroutines embedded.
zFile2MD5 -- analog of File2MD5 but operates on compressed files to
                         determine (or check) their md5sum pool name.

zFile2FullMD5 -- very simple routine to calculate the md5sum of the
                          full file (this is the standard md5sum, not the one 
used
                          in pool names -- for that see above)

zcompare -- compares the inflated data of two compressed
                        files. Basically, like the standard compare routine, but
                        reads in using ZIO.

jcompare -- rewrite of File::Compare since it barfs on weird filenames
                        since it uses the "old" form of open. Plus some clean up
                        tweaks.

---------------------------------------------------------------
#!/usr/bin/perl
#BackupPC_fixLinks.pl - Jeffrey Kosowsky - 110908 - version 1.0

use strict;
use File::Path;
use File::Find;
#use File::Compare;
use Getopt::Std;
use Fcntl;  #Required for RW I/O masks

use lib "/usr/share/BackupPC/lib";
use BackupPC::FileZIO;
use BackupPC::Lib;
use BackupPC::Attrib qw(:all);

die("BackupPC::Lib->new failed\n") if ( !(my $bpc = BackupPC::Lib->new) );

use constant LINUX_BLOCK_SIZE     => 4096;
use constant TOO_BIG              => 2097152; # 1024*1024*2 (2MB)

no  utf8;
my %Conf   = $bpc->Conf();

my %opts;
if ( !getopts("i:l:fb:dsqvch", \%opts) || @ARGV > 0 || $opts{h} ||
         ($opts{i} && $opts{l})) {
    print STDERR <<EOF;
usage: $0 [options]

  First, find duplicate entries in the pool.
  Then, search through backup tree to find links to dups. Also, look for
  (non-zero) files that are not linked to the pool (only 1 link).
  Optionally, relink dups and unlinked files (does not affect the pool)
  Optionally, run BackupPC_nightly to clean up the pool.

  Note: you may want to run BackupPC_nightly also before runing this to make
  sure there are no holes in the pool (although this shouldn''t happen...)

  Options:
    -i <inode file>  Read innodes from file and proceed
    -l <link file>   Read links from file and proceed
    -f               Fix links
    -c               Clean up pool - schedule BackupPC_nightly to run 
                     (requires server running)
    -s               Skip first step of generating/reading cpool dups
    -b <path>        Search backups from <path> (relative to TopDir/pc)
    -d               Dry-run
    -q               Quiet - only print summaries & results
    -v               Verbose - print details on each relink
    -h               Print this usage message
EOF
exit(1);
}
my $file = ($opts{i} ? $opts{i} : $opts{l});
my $verbose =!$opts{q};
my $Verbose=$opts{v};
my $dryrun = $opts{d};
my $fixlinks = $opts{f};
my $runnightly = $opts{c};
#$dryrun =1; #JJK - for testing force to always dry run
#######

my $md5 = Digest::MD5->new;


my $MaxLinks = $Conf{HardLinkMax};
my $cmprsslvl =$Conf{CompressLevel};
#Note we get rid of any extra lurking double slashes and any trailing slash for 
directories
(my $TopDir = $bpc->TopDir()) =~ s|//*|/|g; $TopDir =~ s|/$||;
(my $pooldir = $bpc->{PoolDir}) =~ s|//*|/|g; $pooldir =~ s|/$||;  
(my $cpooldir = $bpc->{CPoolDir}) =~ s|//*|/|g; $cpooldir =~ s|/$||;  

my $pc = "${TopDir}/pc";
my $attrib;
my @backups;
if ($opts{b}) {
        (my $backups = "$pc/$opts{b}") =~ s|//*|/|g; $backups =~ s|/$||;
        die "ERROR: '$backups' directory doesn't exist\n" unless -d $backups;
        @backups = ($backups =~ m|^($pc/[^/]+)/?$| ? <$1/[0-9]*> : ($backups));
        # If path stops at host, then glob for all backup numbers.
}
else { # Look at all backups - begin 2 levels down i.e. in: 
TopDir/pc/<host>/<nn>
        @backups = <$pc/*/[0-9]*>;
}

my %md5sumhash;  #Hash used to store previously seen full file md5sums for 
NewFiles
my (%inodHOA);
# First find and create hash of arrays of duplicated pool entries:
#  %inodHOA = (
#          <duplicated inode> => [ <name of equivalent parent> , <name of 
duplicate>, <pool/cpool>, <checksum>, <num links>, <size>],
#          ...
#          <duplicated inode> => [ <name of equivalent parent> , <name of 
duplicate>, <pool/cpool>, <checksum>, <num links>, <size>],
#          <duplicated inode> => [ <name of equivalent parent> , <name of 
duplicate>, <pool/cpool>, <checksum>, <num links>, <size>],
#       );
# where checksum = [=-#x@]<first bytpe of dup><first byte of parent>
#   = if files match
#   - if only decompresed versions match
#   # if only decompressed versions match (and flipped))
#   x if newlink/badlink
#   @ if same inode

my @MatchA;
# @MatchA = (<matchname>,  <inoM>, <md5sum>, <dupmd5|matchtype>, pool, 
<cmprflg><matchbyte><md5sumbyte>, <nlink1M>, <sizeM> )
# where:
#
#  matchname = File name and partial path (beginning after 'pc') to
#              the match in the pc tree. Note when we print it to a
#              file we enclose it in double-quotes "<matchname>"
#
#  inoM      = Inode of the match
#
#  md5sum    = Name of pool entry that has the same (uncompressed)
#              contents as matchname. The name equals the md5sum of
#              the (uncompressed) file plus potentially an _NNN suffix
#              if the data matches something other than the stem
#              md5sum in the pool (or equals all zeros if sum is not
#              calculable for some reason - shouldn't happen).
#              This is the target that we want to link matchname to
#
#  dupmd5    = Name of duplicate pool entry (which is again the md5sum
#              of the contents plus potentially an _NNN suffix). We
#              don't actually need to modify this file. We just unlink
#              all the backup files that share its inode and then let
#              BackupPC nightly delete it when it has no more other
#              links.
#
# matchtype = One of the following
#                NewLink = if match has only one inode but matches 
#                          an existing pool element
#                NewFile = if match has only one inode but doesn't match 
#                          an existing pool element
#                MD5Err  = if for some reason couldn't calculate MD5sum 
#                          (this shouldn't happen)
#
#  pool     =    pool/cpool
#
#  cmparflg = Flag showing how the match and the target compare
#                 @ if this is a duplicate pool element with the SAME inode 
#                   as its parent (i.e. as 'md5sum') -- shouldn't happen
#                 = if 'matchname' has the same contents as 'md5sum'.
#                 - if 'matchname' inflates (i.e. uncompresses) to the same
#                   contents as 'md5sum' (this typically happens
#                   when 'md5sum' has a checksum seed and 'matchname' doesn't
#                 # if 'dupmd5' inflates (i.e. uncompresses) to the same
#                   contents as 'md5sum' but this time 'dupmd5' has the
#                   checksum seed (and the parent which now has a lower
#                   suffix doesn't. For pool dups, this is the reverse case
#                   of '-'. Not applicable for NewLinks and NewFiles.
#                 x MD5Err or if first NewFile that has this contents
#                   (and corresponding md5sum)
#                 y if NewFile but a previous NewFile already has this contents
#                   (and corresponding md5sum
#
# matchbyte  = First byte of the matched file (or dup pool element)
# md5sumbyte = First byte of the corresponding (parent) pool entry that we
#              will be linking to
#                = d6 or d7 if file is compressed and checksum seed present
#                = 78 if file is compressed and checksum seed NOT present
#   nlink1M    = Number of links to the match MINUS 1
#   sizeM      = Size of the match in bytes
#
#   Note for matches corresponding to duplicate pool elements, by design:
#   MatchA = (<matchname>, $inoM, @{$inodHOA{$inoM}})

my ($totdups, $collisions, $totlinks, $totsize) = (0, 0, 0, 0);
my ($totmatches, $totmd5errs, $totunlinked, $totnewfiles, $totnewlinks, 
$totfixed, $totbroken)
        = (0, 0, 0, 0, 0, 0, 0);

# Find or read-in list of duplicate pool entries
if (!$opts{s}) {  # Read in or find duplicate pool entries
        if ($opts{i} || $opts{l}) { #Read in previously generated list of 
inodes (note link entriew will be ignored if they exist)
                read_inodHOA($file);
                print_inodHOA() if $verbose;
        }
        elsif (!$opts{s}){ # Find inodes
                find(\&pool_dups, $pooldir, $cpooldir); 
        }
        print "Found $totdups dups (and $collisions true collisions) with 
$totlinks total links and $totsize size\n";
}

# Find backup files with broken/missing links or with links to duplicate pool 
entries
if ($opts{l}) { # Read in previously generated list of inodes && start fixing 
links if -r flag set
        read_LinkFile($file);
        $totunlinked = $totnewlinks + $totnewfiles;
        print "Found $totmatches matching files and $totunlinked unlinked files 
($totnewfiles NewFiles, $totnewlinks NewLinks, $totmd5errs MD5Errors)\n";
}
else {
        while (@backups) {
                my $backup = shift(@backups);
                $backup =~ m|^($pc/[^/]+/[0-9]+)|;
                $cmprsslvl = get_compressLevel($1); #Note this is set at the 
level of the backup number
                $attrib = BackupPC::Attrib->new({ compress => $cmprsslvl });
                print "Finding links in $backup\n";
                find(\&find_BadOrMissingLinks, $backup);
        }
        $totunlinked = $totnewlinks + $totnewfiles;
        print "Found $totmatches matching files and $totunlinked unlinked files 
($totnewfiles NewFiles, $totnewlinks NewLinks, $totmd5errs MD5Errors)\n";
}
print "Fixed $totfixed out of $totbroken links\n" if $fixlinks;
run_nightly() if (!$dryrun && $runnightly);
print "DONE\n";
exit;

#####################################################################################################
sub pool_dups {
        my ($devD, $inoD, $modeD, $nlinkD, $uidD, $gidD, $rdevD, $sizeD, 
$therestD);
        my ($devP, $inoP, $modeP, $nlinkP, $uidP, $gidP, $rdevP, $sizeP, 
$therestP);
        my $comparflg;
        unless (-r) {  # First check for read error on found element
                warn "ERROR can't read : $File::Find::name\n";
                return;
        }
        # Then get root/suffix and check if it is a potential duplicate
        return unless -f && m|(.*)_(.*)|; # file doesn't end with _<num>
        my $root=$1;
        my $suffix=$2;
        my $dup=$_;
        $File::Find::dir =~ m|(c?pool)/[/[:xdigit:]]+$|;
        my $thepool = $1;

        # Then get file information
        unless (($devD, $inoD, $modeD, $nlinkD, $uidD, $gidD, $rdevD, $sizeD, 
$therestD) 
                        = stat($dup)) {
                warn "ERROR can't stat: $File::Find::name\n";
                return;
        }
        my $prevsuffix = ($suffix == 0 ? '' : '_' . ($suffix -1));
        warn "ERROR: Hole in pool chain at $root$prevsuffix" unless -f 
"$root$prevsuffix";

        # Then check to see if any of its "parents" are duplicates

        my $parent = $root;
        for (my $i=-1; $i <  $suffix; $i++, $parent="$root\_$i" ) { 
        #Start at base of chain and move up (note start with -1 for root)
                unless( -f $parent ) {
                        warn "ERROR parent not a file or unreadable: 
$File::Find::dir/$parent\n";
                        next;
                }
                ($devP, $inoP, $modeP, $nlinkP, $uidP, $gidP, $rdevP, $sizeP, 
$therestP) = stat($parent);
                if ($inoP == $inoD) { #same inodes
                        $comparflg='@';
                }
                elsif (($nlinkP + $nlinkD) >= $MaxLinks) {
                        next; # Too many links even if files the same
                }
                elsif ( ($comparflg = compare_files($parent,$dup, ($thepool eq 
"cpool" ? 1 :0))) > 0 ) { #Found match
                        $comparflg = ($comparflg == 1 ? '=' : '-');
                }
                else { next; } # Parent is not a copy
                my $fbyteD = firstbyte("$File::Find::dir/$dup");
                my $fbyteP = firstbyte("$File::Find::dir/$parent");
                if(($fbyteD eq 'd6' || $fbyteD eq 'd7') && 
                   !($fbyteP eq 'd6' || $fbyteP eq 'd7'))
                  #NOTE: compressed file without checksums starts with 0x78
                  #      compressed file with checksums starts with 0xd6 or 0xd7
                {  #swap $dup & $parent if only $dup has rsync seed
                        my $temp = $dup; $dup = $parent; $parent = $temp;
                        $temp = $fbyteD; $fbyteD = $fbyteP;     $fbyteP = $temp;
                        $nlinkD = $nlinkP; $sizeD = $sizeP;
                        $comparflg='#';
                }
                $inodHOA{$inoD} = [$parent, $dup, $thepool, 
$comparflg.$fbyteD.$fbyteP, --$nlinkD, $sizeD];
                print "$inoD @{ $inodHOA{$inoD} }\n" if $verbose;
#               print "$inoD $parent $dup $thepool $comparflg, $nlinkD 
$sizeD\n";
                $totdups++;
                $totlinks += $nlinkD;
                $totsize += $sizeD;
                return;  #Earliest duplicate checksum (i.e. parent) in the 
chain found so stop going down chain
        }
        # No matching copies found in the chain
        print "$inoD $dup COLLISION $thepool X $nlinkD $sizeD\n" if $verbose;
        $collisions++;
}

sub firstbyte {
        my $fbyte='';
        sysopen(my $fh, $_[0], O_RDONLY) || return -1;
        if (sysread($fh, $fbyte, 1) != 1) {
                $fbyte = -2;
        }
        close($fh);
        return (unpack('H*',$fbyte));  # Unpack as 2 char hexadecimal string
}

sub print_inodHOA {
        for my $inode (keys %inodHOA) {
                print "$inode @{ $inodHOA{$inode} }\n";
#               print "$inodHOA{$inode}[0] $inodHOA{$inode}[1] etc...\n";
        }
}

sub read_inodHOA {
        my $file=$_[0];
        $totdups = $collisions = $totlinks = $totsize = 0;
        open(IN,$file) || die "Can't open $file for reading";
        while(<IN>) {
                
m|^(\d+)\s+([[:xdigit:]]+(_\d+)?)\s+([[:xdigit:]]+(_\d+)?)\s+(c?pool)\s+([-=#@][[:xdigit:]]+)\s+(\d+)\s+(\d+)|
 || next;
                $inodHOA{$1} = [$2, $4, $6, $7, $8, $9]; 
                $totdups++;
                $totsize += $10;
                $totlinks += $9;
        }
}
                
sub find_BadOrMissingLinks {
        my $fixed ='';
        unless (-r) {  # First check for read error on found element
                warn "ERROR can't read : $File::Find::name\n";
                return;
        }
        ( -f) || return; #Not a file
        /^f/ || return; # Skip files without 'f' mangle
        my $matchtype= BadOrMissingLinks ($File::Find::name);
        return if $matchtype < 0;
        if($fixlinks && $matchtype > 0) {
                $totbroken++;
                if(fix_links($matchtype) > 0) { #Go fix link...
                        $totfixed++;
                        $fixed=" FIXED";
                }
                else {$fixed=" BROKEN";}
        }
        if ($verbose) {
                my $name = shift(@MatchA);
                print "\"" . $name . "\" " . join(" ", @MatchA) . "$fixed\n";
        }
}

# Return -1 if no match
# Return 0 if MD5Err - shouldn't happen
# Return 1 if links to pool dup in %inodHoA
# Return 2 if no links to pool but matching pool entry found (NewLink)
# Return 3 if no links to pool and no matching pool entry found (NewFile)
sub BadOrMissingLinks {
        my $matchpath = $_[0];
        (my $matchname = $matchpath) =~ s|^$pc/*||; # Delete leading path 
directories (up to machine)

        my $rettype;
        my $matchtype;
        my ($devM, $inoM, $modeM, $nlinkM, $uidM, $gidM, $rdevM, $sizeM, 
$therestM);

        unless (($devM, $inoM, $modeM, $nlinkM, $uidM, $gidM, $rdevM, $sizeM, 
$therestM)
                        = stat($_)) {
                warn "ERROR can't stat: $matchpath\n";
                return;
        }
        if ($nlinkM == 1 && $sizeM > 0) { # Non-zero file with no link to pool
                my $matchbyte = firstbyte($matchpath);
                my $comparflg = 'x';  # Default if no link to pool
                my $matchtype = "NewFile"; # Default if no link to pool
                my $md5sumbyte = '00'; # Default if no link to pool
                my $md5sum = zFile2MD5($bpc, $md5, $matchpath, 0, $cmprsslvl);
                if ($md5sum == -1) { #Can't create MD5sum
                        $md5sum = "00000000000000000000000000000000";
                        $matchtype = "MD5Error";
                        $totmd5errs++;
                        $rettype=0;
                        goto match_return;
                }
                my $thepool = ($cmprsslvl > 0 ? "cpool" : "pool");
                my $thepooldir = ($cmprsslvl > 0 ? $cpooldir : $pooldir);
                my $md5sumpath = my $md5sumpathbase = $bpc->MD52Path($md5sum, 
0, $thepooldir);
                my $i;
                for ($i=-1; -f $md5sumpath ; $md5sumpath = $md5sumpathbase . 
'_' . ++$i) {
            #Again start at the root, try to find best match in pool...
                        if ((my $cmpresult  = compare_files ($matchpath, 
$md5sumpath, $cmprsslvl)) > 0) { #match found

                                my $inod =(stat($md5sumpath))[3];
                                if (exists $inodHOA{$inod}) { #Oops target set 
to be relinked
                                        $md5sum = $inodHOA{$inod}[0]; # Set to 
parent
                                        $md5sumpath =$bpc->MD52Path($md5sum, 0, 
$thepooldir);
                                        $cmpresult = 
compare_files($matchpath,$md5sumpath, $cmprsslvl);
                                        warn "Note: NewLink is also a duplicate 
pool entry - relinking & fixed\n";
                                }
                                else {
                                        ($md5sum .= '_' . $i) if $i >= 0;

                                }
                                $comparflg = ($cmpresult == 1 ? '=' : '-');
                                $md5sumbyte = firstbyte($md5sumpath);
                                $matchtype = "NewLink";
                                $totnewlinks++;
                                $rettype=2;
                                goto match_return;
                        } #Otherwise, continue to move up the chain looking for 
a pool match...
                }
                $totnewfiles++; #Otherwise must be a NewFile
                my $fullmd5sum = zFile2FullMD5($bpc, $md5, $matchpath, 
$cmprsslvl);
                ($md5sum .= '_' . $i) if $i >= 0;  # Name of first empty pool 
slot
                if ($md5sumhash{$fullmd5sum}) {   #Already seen before!
                        $comparflg = 'y';
                        $md5sum = $md5sumhash{$fullmd5sum};
                        $rettype=2;
                }
                else {
                        $md5sumhash{$fullmd5sum} = $md5sum;
                        $rettype=3;
                }

          match_return:
                @MatchA = ($matchname, $inoM, $md5sum, $matchtype, $thepool, 
${comparflg}.${matchbyte}.${md5sumbyte}, $nlinkM, $sizeM);
#               print "\"$matchname\" $inoM $md5sum $matchtype $thepooldir 
${comparflg}${matchbyte}${md5sumbyte} $nlinkM $sizeM\n";
                return $rettype;
        }
        elsif (exists $inodHOA{$inoM}) { #File links to dup element in our list
                @MatchA = ($matchname, $inoM, @{$inodHOA{$inoM}});
#               print "\"$matchname\" $inoM @{ $inodHOA{$inoM} }\n";
                $totmatches++;
                return 1;  #type=1
        }
        else { return -1;} #No dup or single-linked file
}

# Compute the MD5 digest of a compressed file.  For efficiency we don't
# use the whole file for big files:
#   - for files <= 256K we use the file size and the whole file.
#   - for files <= 1M we use the file size, the first 128K and
#     the last 128K.
#   - for files > 1M, we use the file size, the first 128K and
#     the 8th 128K (ie: the 128K up to 1MB).
# See the documentation for a discussion of the tradeoffs in
# how much data we use and how many collisions we get.
#
# Returns the MD5 digest (a hex string).
#
# If $filesize < 0 then always recalculate size of file by fully uncompressing
# If $filesize = 0 then first try to read corresponding attrib file
#    (if it exists), if doesn't work then recalculate
# IF $filesize > then use that as the size of the file

sub zFile2MD5
{
    my($bpc, $md5, $name, $filesize, $compresslevel) = @_;

        my $fh;
        my $rsize;
        my $totsize;
        unless (defined ($fh = BackupPC::FileZIO->open($name, 0, 
$compresslevel))) {
                warn "ERROR can't open $name\n";
                return -1;
        }
        
        my $datafirst = my $datalast = '';
        my @data = ('','');
        #First try to read up to the first 128K (131072 bytes)
        if ( ($totsize = $fh->read(\$datafirst, 131072)) < 0 ) {
                warn "ERROR can't read & decompress $name\n";
                return -1;
        }
        elsif ($totsize == 131072) { # Read up to 1st MB
                my $i=0;
                #Read in up to 1MB (1048576), 128K at a time and alternate 
between 2 data buffers
                while ( ($rsize = $fh->read(\($data[(++$i)%2]), 131072)) == 
131072
                                &&  ($totsize += $rsize) < 1048576) {}
                $totsize +=$rsize if $rsize < 131072; # Add back in partial read
            $datalast = substr($data[($i-1)%2], $rsize, 131072-$rsize)
                        . substr($data[$i%2], 0 ,$rsize);
    }
    $filesize = $totsize if $totsize < 1048576; #Already know the size because 
read it all
    if ($filesize == 0) { # Try to find size from attrib file
            if ((my $fileinfo= get_file_attribs($name)) == -1) { 
                        warn "Can't read size of $name from attrib file so 
calculating manually\n";
                        $filesize = -1; #Can't read from attrib file so do it 
manually
                }
                else {
                        $filesize = $fileinfo->{size};  #File size read from 
attrib file
                }
    }
    if ($filesize < 0) { #continue reading to calculate size
                while (($rsize = $fh->read(\($data[0]), 131072)) > 0) {
                    $totsize +=$rsize
        }
        $filesize = $totsize;
   } 
   $fh->close();

        $md5->reset();
    $md5->add($filesize);
    $md5->add($datafirst);
    ($datalast eq '') || $md5->add($datalast);
    return $md5->hexdigest;
}

#Compute md5sum of entire (compressed) file
sub zFile2FullMD5
{
    my($bpc, $md5, $name, $compresslevel) = @_;

        my $fh;
        my $data;

        unless (defined ($fh = BackupPC::FileZIO->open($name, 0, 
$compresslevel))) {
                warn "ERROR can't open $name\n";
                return -1;
        }

        $md5->reset();  
        while ($fh->read(\$data, 65536) > 0) {
                $md5->add($data);
        }

    return $md5->hexdigest;
}

#Input is a full path name to the file
sub get_file_attribs
{
        my ($fullfilename) = @_;
        $fullfilename  =~ m|(.*)/f(.*)|;
        my $dir = $1;
        my $file = $2;

        if (-r "$dir/attrib" && $attrib->read($dir, "attrib")
                && (my $info = $attrib->get($file))) {
#Note: $info is a hash of the attributes of file, consisting of the keys:
# uid, mtime, mode, size, sizeDiv4GB, type, gid, sizeMod4GB)
#$Data::Dumper::Indent = 1;
#print Dumper($info);
                return $info;
        }
        else {
                return -1;
        }
}

# Return  compression level of backup
sub get_compressLevel {
        my ($bakdir) = @_;
        my $bakinfo = "$bakdir/backupInfo";
        our %backupInfo = ();

        unless (-f $bakinfo ){
                warn "Can't read $bakinfo, using default compress level: 
$Conf{CompressLevel}\n";
                return $Conf{CompressLevel};   # Just use default compress 
level from config file...
        }
        unless (my $ret = do $bakinfo) { # Load  the backupInfo file
                warn "couldn't parse $bakinfo: $@" if $@;
                warn "couldn't do $bakinfo: $!" unless defined $ret;
                warn "couldn't run $bakinfo" unless $ret;
        }
        if ( !keys(%backupInfo) || !defined($backupInfo{level}) ) {
                warn "Can't read CompressLevel, using default instead: 
$Conf{CompressLevel}\n";
                return $Conf{CompressLevel};
        }
        return $backupInfo{compress};
}

#Read in link file for matching pool md5sums(dups), NewFiles, NewLinks; don't 
read in MD5Err entries or other errors
sub read_LinkFile {
        my $file=$_[0];
        my $matchtype;
        my $fixed='';
        open(IN,$file) || die "Can't open $file for reading";
        while(<IN>) {
                $matchtype = read_match($_);
                ++$totmatches if $matchtype==1;
                ++$totnewlinks if $matchtype==2;
                ++$totnewfiles if $matchtype==3;
                if($fixlinks && $matchtype > 0) {
                        $totbroken++;
                        if (fix_links($matchtype) > 0) {
                                $totfixed++;
                                $fixed=" FIXED";
                        }
                        else {$fixed=" BROKEN";}
                }
                my $name = shift(@MatchA);
                print "\"" . $name . "\" " . join(" ", @MatchA) . "$fixed\n" 
                        if $matchtype >= 0 && $verbose;
        }
}

sub read_match {
        my $ret=-1;
        if 
(m|^"(.*)"\s+(\d+)\s+([[:xdigit:]]+(_\d+)?)\s+([[:xdigit:]]+(_\d+)?)\s+(c?pool)\s+([-=#@][[:xdigit:]]+)\s+(\d+)\s+(\d+)|)
 {
                $ret=1; #Dup match:  Link to dup node in pool
        }
        elsif 
(m|^"(.*)"\s+(\d+)\s+([[:xdigit:]]+(_\d+)?)\s+((NewLink))\s+(c?pool)\s+([-=][[:xdigit:]]+)\s+(\d+)\s+(\d+)|)
 {
                $ret=2; #NewLink: File without links but has matching pool 
entry (Note parentheses added to keep numbering the same)
        }
        elsif 
(m|^"(.*)"\s+(\d+)\s+([[:xdigit:]]+(_\d+)?)\s+((NewFile))\s+(c?pool)\s+(x[[:xdigit:]]+)\s+(\d+)\s+(\d+)|)
 {
                $ret=3; #NewFile: File without links and without existing 
matching pool entry and without a previous NewFile
                        #with the same content (Note parentheses added to keep 
numbering the same)
        }
        elsif 
(m|^"(.*)"\s+(\d+)\s+([[:xdigit:]]+(_\d+)?)\s+((NewFile))\s+(c?pool)\s+(y[[:xdigit:]]+)\s+(\d+)\s+(\d+)|)
 {
                $ret=2; #NewFile2: File without links and without existing  
matching pool entry but a previous NewFile with the same
                        #content will previously have created the new pool 
entry (Note parentheses added to keep numbering the same)
        }
        else {return -1;}
        @MatchA = ( $1, $2, $3, $5, $7, $8, $9, $10);
        return $ret;
}

sub fix_links {
        my ($type) = @_;
        my ($matchname, $inoM, $md5sum, $matchtype, $thepool, $checksumbytes, 
$nlinkM, $sizeM) = @MatchA;
        $checksumbytes =~ m|^(.)(..)(..)$|;
        my $cmprflag = $1;
        my $matchbyte = $2;
        my $md5sumbyte = $3;
        my $md5sumpath = $bpc->MD52Path($md5sum, 0, ($thepool eq "cpool" ? 
$cpooldir : $pooldir));
        my $matchpath = "$pc/$matchname";
        my $compress = ($thepool eq "cpool" ? 1 : 0);
        #First, perform extra checks (should be unncessary, but I'm paranoid)
        unless (-r $matchpath) {
                warn "ERROR: \"$matchpath\" - Can't read file\n";
                return -1;
        }
        my ($devMM, $inoMM, $modeMM, $nlinkMM, $uidMM, $gidMM, $rdevMM, 
$sizeMM, $therestMM) = stat($matchpath);
        if ($inoM != $inoMM || $sizeM != $sizeMM) {
                warn "ERROR: \"$matchpath\" - Something changed... Inode or 
size doesn't match previous\n";
                return -1;
        }

        if (($type == 1 && $matchtype =~ m|^[[:xdigit:]]+(_\d+)?$|)  || 
#Duplicate pool entry
                ($type == 2 && $matchtype =~ m|^NewLink$|) ||               
#New Link
                ($type == 2 && $matchtype =~ m|^NewFile$|)) {               
#New File with previously created link (by previous NewFile)
                # Unlink $matchname and relink to $md5sum

                unless ( -r $md5sumpath) {
                        warn "ERROR: \"$matchname\" - Can't read new link 
target: \"$md5sum\"\n";
                        return -1;
                } 

                my ($devP, $inoP, $modeP, $nlinkP, $uidP, $gidP, $rdevP, 
$sizeP, $therestP) = stat($md5sumpath);
                if (($nlinkP + 1) >= $MaxLinks) { 
                        warn "ERROR:  \"$matchname\" - Too many links if added 
to \"$md5sum\"\n";
                        return -1;
                }
                if(compare_files($matchpath, $md5sumpath, $compress) <= 0) {
                        warn "ERROR: \"$matchname\" - contents don't match 
\"$md5sum\"\n";
                        return -1;
                }

                if(!$dryrun && !unlink($matchpath)){
                        warn "ERROR: \"$matchname\" - unlink failed\n";
exit; #JJK
                        return -1;      
                }
            if(!$dryrun && !link($md5sumpath, $matchpath)){
                        warn "ERROR: \"$matchname\" - link from \"$md5sum\" 
failed\n";
exit; #JJK
                        return -1;
                        }
                print "\"$matchname\" successfully (re)linked from $matchtype 
[$inoM] to $md5sum [$inoP]" if $Verbose;
                return 1;
        }
        elsif ($type == 3 && $matchtype =~ m|^NewFile$|) {  #New File
                # Make new link in pool directory, adding additional 
subdirectories as needed
                if ( -r $md5sumpath) {  # Check to see if something else took 
the planned target
                        warn "ERROR: \"$matchname\" - target already exists: 
\"$md5sum\"\n";
                        return -1;
                } 
                $md5sum =~ m|^([[:xdigit:]]+)|; # Strip off the suffix
                unless (zFile2MD5($bpc, $md5, $matchpath, $compress) == $1) {
                        warn "ERROR: \"$matchname\" - md5sum doesn't match 
\"$md5sum\"\n";
                        return -1;
                }
                $md5sumpath =~ m|(.*)/|;  # Find the containing directory
                mkpath($1, 0, 0777) if ( !$dryrun && !-d $1);
                print "\"$matchname\" - Making new pool directory $1\n" if 
($Verbose && ! -d $1);
            if (!$dryrun && !link($matchpath, $md5sumpath)){ # Note reverse 
order of link from types 1&2
                        warn "ERROR: \"$matchname\" - link to \"$md5sum\" 
failed\n";
exit; #JJK
                        return -1;
                }
                print "\"$matchname\" successfully linked to new file $md5sum 
[$inoM]" if $Verbose;
                return 1;
        }
        else {
                warn "ERROR: Invalid type ($type) doesn't match $matchtype\n";
                return -1;
        }
}

sub run_nightly
{
        my $err = $bpc->ServerConnect($Conf{ServerHost}, $Conf{ServerPort});
        if ($err) {
                warn "ERROR: BackupPC_nightly: can't connect to server 
($err)...\n";
                return($err);
        }
        if ((my $reply = $bpc->ServerMesg("BackupPC_nightly run")) == 0) {
                $bpc->ServerMesg("log $0: called for BackupPC_nightly run...");
                return 0;
        }
        else {
                warn "ERROR: BackupPC_nightly ($reply)...\n";
                return $reply;
        }
}

sub compare_files
{
        my ($fh1, $fh2, $compress)=@_;
        return 1 if !jcompare($fh1, $fh2);  #Matches as-is
        return 2 if $compress && !zcompare($fh1, $fh2, $compress);  #Matches 
post-inflation
        return 0; # Not a match or error
}

sub zcompare {
        my ($fh1, $fh2, $compress)=@_;
        my $ret=0;

        unless ((defined ($fh1 = BackupPC::FileZIO->open($_[0], 0, $compress))) 
&&
                        (defined ($fh2 = BackupPC::FileZIO->open($_[1], 0, 
$compress)))) {
                $ret = -1;
                goto zcompare_return;
        }
        my $data1 = my $data2 = '';
        my ($r1, $r2);
        while ( ($r1 = $fh1->read(\$data1, 65536)) > 0 ) {
                unless ((($r2 = $fh2->read(\$data2, $r1)) == $r1)
                                && $data1 eq $data2) {
                        $ret=1;
                        goto zcompare_return;
                }
        }
        $ret =1 if ($r2 = $fh2->read(\$data2, LINUX_BLOCK_SIZE)) > 0; #see if 
anything left...
        $ret =-1 if $r1 < 0 || $r2 < 0; # Error on read

  zcompare_return:
        (defined $fh1) if $fh1->close();
        (defined $fh2) if $fh2->close();
        return $ret;
}

#Rewrite compare function (in File::Compare) since it messes up on weird
#filenames with spaces and things (and get rid of extra fluff while 
#we are at it)

sub jcompare {
    my ($fh1,$fh2,$size) = @_;
        my ($fh1open, $fh2open, $fh1size);
        my $ret=0;

    local (*FH1, *FH2);
        unless ((my $fh1open = open(FH1, "<", $fh1)) &&
                        (my $fh2open = open(FH2, "<", $fh2))) {
                $ret = -1;
                goto compare_return;
        }
        binmode FH1;
        binmode FH2;
        if (($fh1size = -s FH1) != (-s FH2)) {
                $ret=1;
                goto compare_return;
        }

        unless (defined($size) && $size > 0) {
            $size = $fh1size;
            $size = LINUX_BLOCK_SIZE if $size < LINUX_BLOCK_SIZE;
            $size = TOO_BIG if $size > TOO_BIG;
        }

        my $data1 = my $data2 = '';
        my ($r1,$r2);
        while(defined($r1 = read(FH1,$data1,$size)) && $r1 > 0) {
            unless (defined($r2 = read(FH2,$data2,$r1)) && $r2 == $r1
                                && $data2 eq $data1) {
                        $ret=1;
                        goto compare_return;
            }
        }
        $ret=1  if defined($r2 = read(FH2,$data2,LINUX_BLOCK_SIZE)) && $r2 > 0;
        $ret =-2 if $r1 < 0 || $r2 < 0; # Error on read

  compare_return:
        close(FH1) if $fh1open;
        close(FH2) if $fh2open;
        return $ret;
}

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>
  • [BackupPC-users] Script for checking & fixing missing/duplicated/broken links in cpool/pool and pc backups, Jeffrey J. Kosowsky <=