BackupPC-users

[BackupPC-users] New faster/more efficient approach to backing up BackupPC

2011-02-08 11:35:07
Subject: [BackupPC-users] New faster/more efficient approach to backing up BackupPC
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: General list for user discussion <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 08 Feb 2011 11:31:28 -0500
Here is a routine that I wrote that should speed backups of BackupPC
pool/cpool and pc tree. It handles hard-links efficiently and should
be faster than BackupPC_tarPCCopy.

I have tested it on my setup but I WOULD LOVE IT IF OTHERS WOULD TEST
IT AND BENCHMARK IT AGAINST BackupPC_tarPCCopy or other methods
(note this method will almost always be slower than a block disk copy)
I am enclosing a description of how it works along with the program
and the associated library (jLib.pm)
(see below).

--------------------------------------------------------------------
The program requires very little memory or processing power. 

In fact I backed up and restored a BackupPC implementation consisting
of 854 thousand cpool files, 2.5 million pc directories, 6.9 million
pc hard links, and 49 thousand zero length files on the following
minimalist machines.

 - Backup machine: Arm-based plugcomputer with 1.2 GHz processor with
                                   512MB ram and TopDir on a USB hard disk.
 - Restore machine: Arm-based DNS-323 with 800MHz processor and 64MB
                                        RAM and internal hard-disk

Clearly, this is pretty minimalist. The rate limiting step was file
I/O with minimal processing and memory requirements.

--------------------------------------------------------------------
Basically, the program uses the 'trick' that I outlined a few weeks
back. First, the program goes through the pool/cpool to create a new
decimal-tree (which I call an ipool) indexed by the inode of each pool
file and whose contents give the path (i.e. partial file md5sum) of
the corresponding pool file.

After creating this pool, the program then recurses through the pc
tree and creates a list of all the directories, zero length files, and
hard links (including both source and target). The pool target is
found by simply looking up the ipool file corresponding to the pc file
inode. This avoids *any* need to recompute partial file md5sums or to
compare files along a pool chain. This should make the program in
general significantly faster than BackupPC_tarPCCopy. There is also an
option to cached the ipool lookups which saves file IO at the expense
of memory.

The program can optionally either fix any missing/broken links in the
pc tree or writes them out as a list of non-pooled files. Note that
the top-level info and log files that should not be pooled are always
listed out for straight copying over.

Once the backup list is created, the pool can be rsynced (or otherwise
copied) without hard links. Then the program is run again on the
target machine with the --restore|-r flag to create the directory
tree, zero-length files, and hard links as determined by the backup
run.

More details on how the program works can be found in the program
Description below.

--------------------------------------------------------------------
--------------------------------------------------------------------
BackupPC_copyPCPool.pl
--------------------------------------------------------------------

#!/usr/bin/perl
#============================================================= -*-perl-*-
#
# BackupPC_copyPcPool.pl: Reliably copy pool and pc tree
#
# DESCRIPTION
#   See below for detailed description of what it does and how it works
#   
# AUTHOR
#   Jeff Kosowsky
#
# COPYRIGHT
#   Copyright (C) 2011  Jeff Kosowsky
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
#========================================================================
#
# Version 0.1, released February 2011
#
#========================================================================
#
# DESCRIPTION

# 1. The program first creates a new inode-labeled pool (called by
# default 'ipool') which is a decimal tree of depth $ilevels, indexed
# by the last $ilevels significant digits of the inode of the
# corresponding pool/cpool file.  Each file in the Ipool tree
# consists of two lines. The first line is just the pool file name
# (i.e the partial file md5sum plus a potential '_N' suffix). The
# second line consists of an (optional) full file mdsum checksum,
# which is zero if we don't want to use mdsums. Optionally, the
# program will remove any orphan pool file entries (i.e. those with
# only 1 hard link) and renumber the chain accordingly (this is the same
# functionality as provided by BackupPC_nightly

# Note: the indexing is done by the least significant inode digits in
# order to ensure a more uniformly balanced tree.
# Note: the full file mdsum functionality has not yet been
# implemented.

# 2. The program then runs through the pc tree (or a designated
# subtree thereof) to create a single long list of all of the
# directories, hard links, and zero-lengthfiles
# For hard links, the entry is:
#  <path-to-pool-file> <path-to-pc-file>
# For directories, the entry is:
#    D <owner> <group> <mode> <mtime> <path-to-directory>
# For zero-length files, the entry is:
#    Z <owner> <group> <mode> <mtime> <path-to-file>
# For 'errors' (e.g., non-linked, non-zero length files), the
# entry is:
#    X <owner> <group> <mode> <mtime> <path-to-file>
# Comments can be indicated by:
#   #

# Note: all paths are relative to TopDir

# The partial file md5sum is obtained by looking up the inode of the
# pc tree file in the Ipool. Files with multiple links are optionally
# cached in memory which saves IO look-ups at the expense of
# memory. (Note the caching algorithm tries to preferentially cache
# files with the most links)

# Any (non-zero length) pc files that are not linked (i.e. only one
# hard link) or that are not linked to an Ipool file may optionally be
# corrected and linked to the pool and if this results in a new pool
# entry then that entry is added to the Ipool too. Files that are not
# corrected are listed for separate examination and manual correction
# or backup.

# The output consists of 3 files:

# <outfile>.links   is the file consisting of the list of links,
#                   directories and zero-length files described above

# <outfile>.nopool  is a listing of the normal top-level log and info
#                   files that are not linked and need to be copied
#                   over directly

# <outfile>.nolinks is a listing of the non-linked, non-zero length
#                   files that either couldn't be linked or that you
#                   decided not to link

# NOTE: Backuppc on the source machine should be disabled (or a
# snapshot used) during the entire backup process

# 3. Restoring then consists of the following steps:

#    A. Copy over the pool/cpool any way you like, including
#       rsync but without having to worry about hard links

#    B. Copy over the non-pooled files (<outfile>.nopool) any way you want,
#       e.g., rsync, tar, etc.

#    C. Run this program again using the -r|--restore flag and the
#       <outfile>.nolinks as the input

#    D. Optionally copy over the non-linked files in <outfile>.nolinks


# NOTE: Backuppc on the target machine should be disabled (or a
# snapshot used) during the entire restore process

# Selected features:
#   --gzip|-g      flag creates compressed output/input
#   --stdio        writes/reads to stdio allowing you to pipe the backup
#                  stage directly into the restore stage
#   --fixlinks|-f  fixes any missing/broken pc links
#   --topdir|-t    allows setting of alternative topdir
#   --dryrun|-d    doesn't make any changes to pool or pc tree
#   --verbose|-v   verbosity (repeat for higher verbosity)
#   --icache|-i    size of optional memory cache of ipool tree
#
#========================================================================

use strict;
use warnings;
use File::Path;
use Getopt::Long qw(:config no_ignore_case bundling);
use Fcntl;  #Required for RW I/O masks
use Switch;

use lib "/usr/share/BackupPC/lib";
use BackupPC::FileZIO;
use BackupPC::Lib;
use BackupPC::jLib 0.4.0;  # Requires version >= 0.4.0
use BackupPC::Attrib qw(:all);

no  utf8;

#use Data::Dumper; #JJK-DEBUG

select(STDERR); #Auto-flush (i.e., unbuffer) STDERR
$| = 1;

#Variables
my $bpc = BackupPC::Lib->new or die "BackupPC::Lib->new failed\n";      
my $md5 = Digest::MD5->new;
my $attr; #Re-bless for each backup since may have different compress level
my $MaxLinks = $bpc->{Conf}{HardLinkMax};
my $CREATMASK = O_WRONLY|O_CREAT|O_TRUNC;
my %IpoolCache=(); #Cache for retrieved Ipool entries
my @nonpooled = ();
my @backups = ();
my @compresslvls=(); #Pool to use for each backup;

my $directories = 0;
my $totfiles = 0; #Total of next 6 variables
my $zerolengthfiles = 0;
my $existinglink_pcfiles = 0;
my $fixedlink_pcfiles = 0;
my $unlinkable_pcfiles = 0;
my $unlinked_pcfiles = 0;
my $unlinked_nonpcfiles = 0;
my $unlinked_files = 0; #Total of above three

#GetOptions defaults:
my $IcacheSize = my $IcacheSize_def = 10000;
my $cleanpool;
my $create=0; #Default is not to create if non-empty
#$dryrun=1;  #Global variable defined in jLib.pm (do not use 'my') #JJK-DEBUG
my $fixlinks=0;
my $Force;
my $FORCE;
my $gzip;
my $ilevels = my $ilevels_def = 4; #Number of levels in the Ipool tree
my $mdsumflg;
my $outfile;
my $Overwrite;
my $paranoid;
my $pool = my $pool_def = 'both';
my $restore;
my $skippc;
my $stdio;
my $TopDir = my $TopDir_def = $bpc->{TopDir};
my $IpoolDir = my $IpoolDir_def = "ipool";
my $verbose = my $verbose_def = 2;
my $noverbose;

usage() unless( 
        GetOptions( 
                "cleanpool|c"   => \$cleanpool, #Remove orphan pool entries
                "dryrun|d!"     => \$dryrun,    #1=dry run
                "create|C!"     => \$create,    #Create new Ipool
                "fixlinks|f"    => \$fixlinks,  #Fix unlinked/broken pc files
                "Force|F"       => \$Force,     #Override stuff...
                "FORCE"         => \$FORCE,     #OVERWRITES during restore 
(dangerous)
                "icache|I=i"    => \$IcacheSize,#Size of Ipool Cache
                "ipool|i=s"     => \$IpoolDir,  #Inode Pool location relative 
to TopDir
                "gzip|g"        => \$gzip,      #Compress files
                "levels|l=i"    => \$ilevels,   #Number of levels in Inode tree
                "mdsum|m"       => \$mdsumflg,  #Include mdsum;
                "outfile|o=s"   => \$outfile,   #Output file (required)
                "Overwrite|O"   => \$Overwrite, #Overwrite existing files/dirs 
(restore)
                "pool|p=s"      => \$pool,      #Pool (pool|cpool|both)
                "paranoid|P"    => \$paranoid,  #Paranoid error checking
                "restore|r"     => \$restore,     #Restore rather than backup
                "skippc|S"      => \$skippc,    #Only create & populate Ipool
                "stdio"         => \$stdio,     #Print/read to/from stdout/stdin
                "topdir|t=s"    => \$TopDir,    #Location of TopDir
                "verbose|v+"    => \$verbose,   #Verbosity (repeats allowed)
                "noverbose"     => \$noverbose, #Shut off all verbosity
                "help|h"        => \&usage,
        ) &&
        ($create || !$cleanpool)
        );

if($restore) {
        usage() if $mdsumflg || $outfile || 
                (!$stdio && @ARGV != 1) || ($stdio && @ARGV != 0);
}else {
        usage() if ! defined($outfile) || $restore ||
                ($pool !~ /^(pool|cpool|both)/);
}

$verbose = 0 if $noverbose;

############################################################################
if($TopDir ne $TopDir_def) {
        #NOTE: if we are not using the TopDir in the config file, then we
        # need to manually override the settings of BackupPC::Lib->new
        # which *doesn't* allow you to set TopDir (even though it seems so
        # from the function definition, it gets overwritten later when the
        # config file is read)
        $TopDir =~ s|//*|/|g; #Remove any lurking double slashes
        $TopDir =~ s|/*$||g; #Remove trailing slash
        $bpc->{TopDir} = $TopDir;
        $bpc->{Conf}{TopDir} = $TopDir;

        $bpc->{storage}->setPaths({TopDir => $TopDir});
        $bpc->{PoolDir}  = "$bpc->{TopDir}/pool";
        $bpc->{CPoolDir} = "$bpc->{TopDir}/cpool";
}

%Conf   = $bpc->Conf(); #Global variable defined in jLib.pm (do not use 'my')
#############################################################################
#By convention for the rest of this program, we will assume that all
#directory variables have a trailing "/". This should save us some
#efficiency in terms of not having to always add back the slash.
$TopDir .= "/"; $TopDir =~ s|//*|/|g;
my $pcdir = 'pc/';

die "TopDir = '$TopDir' doesn't exist!\n" unless -d $TopDir;
die "TopDir = '$TopDir' doesn't look like a valid TopDir!\n" 
        unless -d "$TopDir/pool" && -d "$TopDir/cpool" && -d "$TopDir/${pcdir}";

system("$bpc->{InstallDir}/bin/BackupPC_serverMesg status jobs >/dev/null 
2>&1");
unless(($? >>8) == 1) {
        die "Dangerous to run when BackupPC is running!!! (use '--Force' to 
override)\n"
                if !$Force && $TopDir eq $TopDir_def;
        warn "WARNING: May be dangerous to run when BackupPC is running!!!\n"; 
    #Warn but don't die if *appear* to be in different TopDir
}

############################################################################
if(defined $restore) {
        do_restore($ARGV[0]);
        exit; #DONE
}
############################################################################
my $sfx = $gzip ? ".gz" : "";
my $linksfile = "${outfile}.links${sfx}";
my $nopoolfile = "${outfile}.nopool${sfx}";
my $nolinksfile = "${outfile}.nolinks${sfx}";
unless($skippc) {
        die "ERROR: '$linksfile' already exists!\n" if !$stdio && -e $linksfile;
        die "ERROR: '$nopoolfile' already exists!\n" if -e $nopoolfile;
        die "ERROR: '$nolinksfile' already exists!\n" if -e $nolinksfile;
        
        my $outpipe = $gzip ? "| gzip > " : "> ";
        if($stdio) {
                open(LINKS, $gzip ? "| gzip -f" : ">&STDOUT"); #Redirect LINKS 
to STDOUT
        }else {
                open(LINKS,  $outpipe . $linksfile)
                        or die "ERROR: Can't open '$linksfile' for 
writing!($!)\n";
        }
        
        open(NOPOOL, $outpipe . $nopoolfile)
                        or die "ERROR: Can't open '$nopoolfile' for 
writing!($!)\n";
        open(NOLINKS, $outpipe . $nolinksfile)
                        or die "ERROR: Can't open '$nolinksfile' for 
writing!($!)\n";
}

############################################################################
chdir($TopDir); #Do this so we don't need to worry about distinguishing
                #between absolute and relative (to TopDir) pathnames
                #Everything following opening the files occurs in TopDir
############################################################################
initialize_backups() unless($skippc);

$IpoolDir .= '/';
$IpoolDir  =~ s|//*|/|g;
if($create || ! -d $IpoolDir) { #Need to create Ipool...
        warn "**Creating new Ipool (${TopDir}$IpoolDir)...\n" if $verbose>=1;
        warn "* Removing old Ipool tree...\n" if $verbose >=2;
        rmtree($IpoolDir) if -d $IpoolDir; #Remove existing Ipool;
        print STDERR "* Creating Ipool directories:[0-9] " if $verbose >=2;
        create_ipool($ilevels, $IpoolDir);
        my $totipool=0;
        if($pool eq "both") {
                $totipool += populate_ipool("pool");
                $totipool += populate_ipool("cpool");
        }
        else {
                $totipool += populate_ipool($pool);
        }
        warn "*Ipool entries created=$totipool\n" if $verbose >=2;
}else { #Use existing pool
        die "ERROR: Inode tree levels ($ilevels) does not match number of 
levels in existing tree ($IpoolDir) -- either remove tree or select --create to 
override...\n"
                if ! -d $IpoolDir . ("9/" x $ilevels) || 
                -d $IpoolDir . ("9/" x ($ilevels+1));
        warn "**Using existing Ipool tree (${TopDir}$IpoolDir) [--create 
overrides]...\n" if $verbose>=1;
}
exit if $skippc;

warn "**Recording nonpooled top level files...\n" if $verbose>=1;
foreach (@nonpooled) {  #Print out the nonpooled files
        printf NOPOOL "%s\n", $_;
}
close(NOPOOL);

warn "**Recording linked & non-linked pc files...\n" if $verbose>=1;
my $lastmachine = '';
my ($compresslvl);
foreach (@backups) {
        m|$pcdir([^/]*)|;
        if($1 ne $lastmachine) {
                $lastmachine = $1;
                Clear_Icache();
        }
        warn "* Recursing through backup: $_\n" if $verbose>=1;
        $compresslvl = shift(@compresslvls);
        $attr = BackupPC::Attrib->new({ compress => $compresslvl });
    #Reinitialize this jLib global variable in case new compress level
        m|(.*/)(.*)|;
        find_cpool_links($1, $2);
}
close(LINKS);
close(NOLINKS);

##############################################################################
#Print summary & concluding message:
printf STDERR "\nDirectories=%d\tTotal Files=%d\n",
        $directories, ($totfiles + $#nonpooled);
printf STDERR "Link files=%d\t [Zero-length=%d,  Hardlinks=%d (Fixed=%d)]\n",
        ($zerolengthfiles+$existinglink_pcfiles+$fixedlink_pcfiles),
        $zerolengthfiles, ($existinglink_pcfiles+$fixedlink_pcfiles),
        $fixedlink_pcfiles;
printf STDERR "Non-pooled Toplevel=%d\n", $#nonpooled;
printf STDERR "Non-pooled Other=%d [Valid-pc=%d (Failed-fixes=%d),  
Invalid-pc=%d]\n",
        $unlinked_files, ($unlinked_pcfiles+$unlinkable_pcfiles),
        $unlinkable_pcfiles, $unlinked_nonpcfiles;

my ($rsyncnopool, $rsyncnolinks);
if($gzip) {
        $gzip="-g ";
        $rsyncnopool = "zcat $nopoolfile | rsync -aOH --files-from=- $TopDir 
<newTopDir>";
        $rsyncnolinks = "zcat $nolinksfile | rsync -aOH --files-from=- $TopDir 
<newTopDir>";
} else {
        $rsyncnopool = "rsync -aOH --files-from=$nopoolfile $TopDir 
<newTopDir>";
        $rsyncnolinks = "rsync -aOH --files-from=$nolinksfile $TopDir 
<newTopDir>";
}

print STDERR <<EOF;
------------------------------------------------------------------------------
To complete copy, do the following as user 'root' or 'backuppc':
  1. Copy/rsync over the pool & cpool directories to the new TopDir
     (note this must be done *before* restoring '$linksfile')
     For rsync:
        rsync -aO $TopDir\{pool,cpool\} <newTopDir>

  2. Copy/rsync over the non-pooled top level files ($nopoolfile)
     For rsync:
        $rsyncnopool

  3. Restore the pc directory tree, hard links and zero-sized files:
        $0 -r ${gzip}[-t <newTopDir>] $linksfile

  4. Optionally, copy/rsync over the non-linked pc files ($nolinksfile)
     For rsync:
        $rsyncnolinks

------------------------------------------------------------------------------

EOF
exit;

##############################################################################
##############################################################################
#SUBROUTINES:

sub usage
{
    print STDERR <<EOF;

usage: $0 [options] -o <outfile> [<relpath-to-pcdir> <relpath-to-pcdir> ...]
       $0 [options] -r <restorefile>

  First, if Ipool tree doesn\'t exist (or if --create option selected), 
  create a new Inode tree (default: ${TopDir_def}$IpoolDir_def).

  Then, recurse through the pool and\/or cpool directories (default: $pool_def)
  and create a new Ipool tree inode corresponding to each pool entry. The Ipool
  tree entry is indexed by the last digits of the Inode and contains the path
  relative to TopDir of the pool entry (plus an optional md5sum line --
  not yet implemented).

  Then, recurse through the paths specified relative to the pc tree or if no
  paths specified then the entire pc tree. 
  - If the file is a (non-pooled) top-level log file, then write its path
    relative to pcdir out to <outfile>.nopool
    Note this includes all non-directories above the share level plus the
    backInfo files that are covered by the input paths
  - If the file is a directory, zero length file, or an existing hard link to
    the tree, then write it out to <outfile>.links
  - If the file is not hard-linked but is a valid non-zero length pc file 
    (f-mangled and present in the attrib file) and --fixlinks is selected,
    then try to link it properly to the appropriate pool.
  - Otherwise, add it to <outfile>.nolinks

  The entries in the IpoolDir can also optionally be cached in a hash, using
  the --icache <cache size> option but the speedup is relatively minimal since
  you are just saving a one block file read

  NOTE: TO ENSURE INTEGRITY OF RESULTS IT IS IMPORTANT THAT BACKUPPC IS NOT
  RUNNING (use --Force to override)

  Note: you should run BackupPC_nightly before running this program so that
  no unnecessary links are backed up; alternatively, set the --cleanpool
  option which will remove orphan pool entries.

  Options: [Common]
   --dryrun|-d            Dry-run - doesn\'t change pool or pc trees
                          Negate with: --nodryrun
   --Force|-F             Overrides various checks (e.g., if BackupPC running
                          or if directories present)
   --FORCE                OVERWRITES during restore (DANGEROUS)
   --gzip|-g              Pipe files to/from gzip compression
   --paranoid|P           Perform extra error checking that should not be 
                          necessary assuming pool, ipool, and pc trees have
                          not changed in the interim
   --topdir|-t            Location of TopDir.
                          [Default = $TopDir_def]
                          Note you may want to change from default for example
                          if you are working on a shadow copy.
   --verbose|-v           Verbose (repeat for more verbosity)
                          [Default level = $verbose_def]
                          Use --noverbose to turn off all verbosity
   --help|-h              This help message

  Options: [Copy only]
   --cleanpool|c          If orphaned files (nlinks=1) found when populating
                          Ipool, remove them (and renumber chain as needed). 
                          NOTE: This shouldn\'t happen if you have just run
                          BackupPC_nightly
   --create|-C            Override creation of new Ipool even if already present
                          Negate with: --nocreate
   --fixlinks|-f          Attempt to link valid pc files back to the pool
                          if not already hard-linked
                          NOTE: this changes files in the pc and\/or pool
                          of your source too!
   --icache|I N           Size of Ipool *memory* cache (0 = off)
                          [Default = $IcacheSize_def]
   --ipool|i [location]   Location relative to TopDir of Ipool tree
                          [Default = $IpoolDir_def]
   --levels|l N           Number of levels in the Ipool tree
                          [Default = $ilevels_def]
   --mdsum|m              Include mdsums [NOT IMPLEMENTED]
   --outfile|-o [outfile] Required stem name for the 3 output files
                             <outfile>.nopools
                             <outfile>.links
                             <outfile>.nolinks
   --pool|-p  [pool|cpool|both]  Pools to include in Ipool tree
   --skippc|-S            Skip recursing pc directory (implies create pool)
   --stdio                Print the directory tree, links and zero-sized files
                          to stdout so it can be piped directly to another copy
                          of the program running --restore
                          NOTE: Status, Warnings, and Errors are printed to 
                          stdout.

  Options: [Restore only]
   --Overwrite|-O         Overwrite existing files & directories
   --stdio                Read the directory tree, links and zero-sized files
                          from stdin so it can be piped directly from another
                          copy of the program running in the create mode
                          For example, the following pipe works:
                          $0 -t <source TopDir> [-g] --stdio | $0 -t <dest 
TopDir> [-g] -r --stdio

EOF
exit(1);
}

#Glob to determine what is being backed up. 
#Make sure each backup seems valid and determine it's compress level
#Collect top-level non-pooled files
sub initialize_backups
{
        if (!@ARGV) { # All backups at backup number level
                # TopDir/pc/<host>/<nn>
                @backups = glob("${pcdir}*/*"); #2 levels down;
                @nonpooled = grep(! -d , glob("${pcdir}*")); #All 
non-directories
        } else { # Subset of backups
                foreach(@ARGV) {
                        my $backupdir = $pcdir . $_ . '/';
                        $backupdir  =~ s|//*|/|g;
                        die "ERROR: '$backupdir' is not a directory\n" unless 
-d $backupdir;
                        if($backupdir =~ m|^\Q${pcdir}\E[^/]+/$|) {  #Hostname 
only
                                push(@backups, glob("${backupdir}*"));
                        } else { # At share level or below #Backup number or 
below
                                push(@backups, ${backupdir});
                        }
                }
                @backups = keys %{{map {$_ => 1} @backups}}; #Eliminate dups
                @nonpooled = keys %{{map {$_ => 1} @nonpooled}}; #Eliminate dups
        }

        push(@nonpooled, grep(! -d, @backups)); #Non-directories
        @backups = grep(-d , @backups); #Directories
        push(@nonpooled, grep(m|/backupInfo$|, @backups)); # backupInfo not 
pooled
    #Note everything else *should* be pooled - if not, we will catch it later 
as an error
        @backups = grep(!m|/backupInfo$|, @backups);

        foreach(@backups) {
                s|/*$||; #Remove trailing slash
                m|^(\Q${pcdir}\E[^/]+/[^/]+)(/)?|;
                die "Backup '$1' does not contain a 'backupInfo' file\n"
                        unless -f "$1/backupInfo";
                push(@nonpooled, "$1/backupInfo") unless $2; #Don't include if 
share level or below
                my $compress = get_bakinfo("$1","compress");
                push(@compresslvls, $compress);
                my $thepool = $compress > 0 ? 'cpool' : 'pool';
                die "Backup '$1' links to non-included '$thepool'\n"
                        if $pool ne 'both' && $pool ne $thepool;
        }
}

#Recursively create directory tree for Ipool - $levels deep starting from
#$newdir. Note: $newdir should have a trailing slash
sub create_ipool
{
        my ($level, $newdir) = @_;
#       print STDERR "$level: $newdir\n"; #JJK-DEBUG
        unless(-d $newdir) {
                mkdir $newdir or die "Can't create directory: $newdir\n";
        }
        if($level--) {
                for(my $i=0; $i <= 9; $i++) {
                        print STDERR "$i " if $level==$ilevels-1 && $verbose 
>=2;
                        create_ipool($level, $newdir . $i . "/"); #Recurse
                }
        }
}

#Iterate through the 'pooldir' tree and populate the Ipool tree
sub populate_ipool
{
        my ($fpool) = @_;
        my (@fstat, $dh, @dirlist, $pfile);
        my $ipoolfiles = 0;

        return unless glob("$fpool/[0-9a-f]"); #No entries in pool
        my @hexlist = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
'b', 'c', 'd', 'e', 'f');
        my ($idir,$jdir,$kdir);
        print STDERR "\n* Populating Ipool with '$fpool' branch:[0-f] " if 
$verbose >=2;
        foreach my $i (@hexlist) {
        print STDERR "$i " if $verbose >=2;
                $idir = $fpool . "/" . $i . "/";
                foreach my $j (@hexlist) {
                        $jdir = $idir . $j . "/";
                        foreach my $k (@hexlist) {
                                $kdir = $jdir . $k . "/";
                                unless(opendir($dh, $kdir)) {
                                        warn "Can't open pool directory: 
$kdir\n" if $verbose>=4;
                                        next;
                                }
                                my @entries = grep(!/^\.\.?$/, readdir ($dh)); 
#Remove dot files
                                closedir($dh);
                                warn "POOLDIR: $kdir (" . ($#entries+1) ." 
files)\n"
                                        if $verbose >=3;
                                if($cleanpool) { #Remove orphans & renumber 
first and then
                                        #create the Ipool after orphan deletion 
& chain renumbering
                                        my @poolorphans = 
                                                grep(-f $kdir . $_ && 
(stat(_))[3] < 2, @entries);
                                        foreach (sort {poolname2number($b) cmp 
poolname2number($a)} 
                                                         @poolorphans) { 
#Reverse sort to minimize moves
                                                $pfile = $kdir . $_;
                                                my $res 
=delete_pool_file($pfile);
                                                if($verbose >=1) {
                                                        $res == 1 ?
                                                                warn "WARN: 
Deleted orphan pool entry: $pfile\n": 
                                                                warn "ERROR: 
Couldn't properly delete orphan pool entry: $pfile\n"
                                                }
                                        }
                                }
                                foreach (@entries) {
                                        # Go through all entries in terminal 
cpool branch
                                        $pfile = $kdir . $_;
                                        next unless -f $pfile;
                                        @fstat = stat(_);
                                        #Note: 0=dev, 1=ino, 2=mode, 3=nlink, 
4=uid, 5=gid, 6=rdev
                                        #7=size, 8=atime, 9=mtime, 10=ctime, 
11=blksize, 12=blocks
                                        if($fstat[3] < 2) { #Orphan pool entry
                                                warn "WARN: Orphan pool entry: 
$pfile\n" 
                                                        if $verbose>=1;
                                        } else { #No sense in creating ipool 
entries for orphans
                                                create_ipool_file($fstat[1], 
$pfile);
                                                $ipoolfiles++;
                                        }
                                }
                        } # kdir
                } # jdir
        } # idir
        print STDERR "\n" if $verbose >=2;
        return $ipoolfiles;
}

sub create_ipool_file
{
        my ($inode, $poolname) = @_;
        my ($fh, $ifile);
        
        my $mdsum = 0;
        $mdsum = get_mdsum($poolname) if $mdsumflg;
        $ifile = InodePath($inode);
#    print STDERR "IFILE: $ifile\n"; #JJK-DEBUG
        if(sysopen($fh, $ifile, $CREATMASK)) {
                syswrite($fh, "$poolname\n$mdsum\n");
                close($fh);
        }
        else {
                warn "ERROR: Can't write to inode pool file: $ifile\n";
        }
}

#Return the Ipool location of inode
#Note: each inode is stored based on the $ilevels least significant digits
#so that the tree should be relatively balanced
#Note this is analogous to MD52Path
sub InodePath
{
        my ($inode, $ipooldir) = @_;

        my $ipath= $inode;
        for(my $i=0; $i < $ilevels; $i++) {
                $ipath = ($inode%10) . '/' . $ipath;
                $inode /=10;
        }
        return((defined $ipooldir ? $ipooldir : $IpoolDir) . $ipath);

}

#Recursively go through pc tree
sub find_cpool_links
{
        my ($dir, $filename) = @_;

        my ($fh, $dh);
        my $file = $dir . $filename;
        if(-d $file) {
                my @fstat = stat(_); #Note last stat was -d $file
                #Print info to signal & re-create directory:
                #D <user> <group> <mod> <atime> <mtime> <file name>
                print_file_reference("D", $file, \@fstat);
                $directories++;
                opendir($dh,$file);
                my @contents = readdir($dh);
                foreach (@contents) {
                        next if /^\.\.?$/;     # skip dot files (. and ..)
                        find_cpool_links($file . '/', $_); #Recurse
                }
        }
        else { #Not a directory
                my @fstat = stat(_);
                $totfiles++;
                if($fstat[7] == 0) { #Zero size file
                        print_file_reference("Z", $file, \@fstat);
                        $zerolengthfiles++;
                        return;
                }elsif($fstat[3] > 1) { #More than one link
                        my ($pooltarget, $ifile);
                        if($IcacheSize && ($pooltarget = 
$IpoolCache{$fstat[1]})) {
#                               print STDERR "Cache hit: $fstat[1]\n"; 
#JJK-DEBUG
                        }elsif(open($fh, '<', ($ifile = InodePath($fstat[1])) 
)) {
                                if ($pooltarget = <$fh>) {
                                        chomp($pooltarget);
                                        close($fh);
                                } else { #Shouldn't happen...
                                        warn "ERROR: Ipool file '$ifile' has no 
data\n";
                                        $pooltarget = undef;
                                }
                        }
                        if($pooltarget) {  #Hard link found in Cache or Ipool
                           if($paranoid &&
                                  ($pooltarget !~  #Not a standard pool name
                                   
m|^(c?pool/[0-9a-f]/[0-9a-f]/[0-9a-f]/[0-9a-f]{32}(_[0-9]+)?)$|
                                   ||  ! -f $pooltarget  #Target pool file 
doesn't exist
                                   || $fstat[1] != (stat(_))[1])) { #Inodes 
don't match
                                   delete $IpoolCache{$fstat[1]};
                                   warn "ERROR: Ipool file '$ifile' for file 
'$file' has bad entry: $pooltarget\n";
                           }else { #Valid hit
                                   $existinglink_pcfiles++;

                                   Add_Icache ($pooltarget, $fstat[1], 
$fstat[3])
                                           if $IcacheSize;
                                   print_hardlink_reference($pooltarget, $file);
                                   return;
                           }
                        }
                }

                if($file =~ m|^\Q${pcdir}\E[^/]+/[^/]+/backupInfo$|) {
                        $totfiles--;
                        return;         #BackupInfo already taken care of in 
'nonpooled'
                }

                #Non-zero sized file not in Ipool/Icache that is not 
'backupInfo'
                if($filename =~ /^f/ && -f $file && -f "$dir/attrib" 
                   && $attr->read($dir,"attrib") == 1
                   && defined($attr->get($bpc->fileNameUnmangle($filename)))) {
                        #VALID pc file if f-mangled, regular file , attrib file 
exists,
                        #and unmangled name is an element of the attrib file
                        if($fixlinks) {
                                if(fix_link($file, \@fstat) == 1) {
                                        $fixedlink_pcfiles++;
                                        return;
                                }else {$unlinkable_pcfiles++;}
                        }else{
                                warn "ERROR: $file (inode=$fstat[1], 
nlinks=$fstat[3]) VALID pc file NOT LINKED to pool\n";
                                $unlinked_pcfiles++;
                        }
                }else {
                        warn "ERROR: $file (inode=$fstat[1], nlinks=$fstat[3]) 
INVALID pc file and UNLINKED to pool\n";
                        $unlinked_nonpcfiles++
                }
                #ERROR: Not linked/linkable to pool (and not zero length file 
or directory
                print_file_reference("X", $file, \@fstat);
                printf NOLINKS "%s\n", $file;
                $unlinked_files++
        }
}

#Try to fix link by finding/creating new pool-link
sub fix_link
{
        my ($filename, $fstatptr) = @_;

        my $poollink = undef;
        my ($md5sum, $result) = zFile2MD5($bpc, $md5, $filename, 0, 
$compresslvl);
        $result = jMakeFileLink($bpc, $filename, $md5sum, 2, $compresslvl, 
\$poollink)
                if $result > 0;
        #Note we choose NewFile=2 since if we are fixing, we always want to 
make the link
        if($result > 0) { #(true if both above calls succeed)
                my $pool = ($compresslvl > 0 ? "cpool" : "pool");
                $poollink =~ m|.*(${pool}.*)|;
                $poollink = $1; #Relative to TopDir
                print_hardlink_reference($poollink, $filename);
                create_ipool_file((stat($poollink))[1], $poollink); #Update 
Ipool
                if($verbose >=2) {
                        warn "NOTICE: pool entry '$poollink' 
(inode=$fstatptr->[1]) missing from Ipool and added back\n" if $result == 3;
                        warn sprintf("NOTICE: '%s' (inode=%d, nlinks=%d) was 
fixed by linking to *%s* pool entry: %s (inode=%d, nlinks=%d)\n", 
                                                 $filename, $fstatptr->[1], 
$fstatptr->[3],
                                                 ($result == 1 ? "existing" : 
"new"), 
                                                 $poollink, (stat(_))[1], 
(stat(_))[3])
                                if $result !=3;
                }
                return 1;
        }
        warn "ERROR: '$filename' (inode $fstatptr->[1]); fstatptr->[3] links; 
md5sum=$md5sum) VALID pc file FAILED FIXING by linking to pool\n";
        return 0;
}

sub print_hardlink_reference
{
        my ($poolname, $filename) = @_;
        
        printf LINKS "%s %s\n", $poolname, $filename; #Print hard link
}

sub print_file_reference
{
        my ($firstcol, $filename, $fstatptr) = @_;

        #<firstcol>  <user> <group> <mod> <mtime> <filename>
        printf(LINKS "%s %s %s %04o %u %s\n",
                   $firstcol, UID($fstatptr->[4]), GID($fstatptr->[5]), 
                   $fstatptr->[2] & 07777, $fstatptr->[9], $filename);
}


sub get_mdsum
{
}

###############################################################################
my ($avglinks, $totentries, $totlinks, $cachesize);
sub Add_Icache
{
        my ($target, $inode, $nlinks) = @_;
        my $fraction = 1/2;

#       printf STDERR "(size=%d|%d)\n", $cachesize, $size2; #JJK-DEBUG
        return if defined $IpoolCache{$inode} || #Already stored
                $nlinks <= 2 || #Only add new keys if nlinks >2 links 
                $nlinks < $fraction * $avglinks; #And if > fraction * average

        if($cachesize >= $IcacheSize) { #Delete "random" key if full
                my $key = each %IpoolCache || each %IpoolCache;
                #Note repetition needed to swallow the 'undef' token and allow 
'each to circle around again (also in scalar context note each returns just the 
key)
                delete($IpoolCache{$key});
                $cachesize--;
#               printf STDERR "Deleting cache link (size=%d, avg=%d)\n", 
$cachesize, $avglinks; #JJK-DEBUG
        }
        $IpoolCache{$inode} = $target;
        $totentries++;
        $totlinks += $nlinks;
        $cachesize++;
        $avglinks = $totlinks/$totentries;
#       printf STDERR "Adding cache link (inode=$inode nlinks=%d) [size=%d, 
avg=%.1f]\n", $nlinks, $cachesize, $avglinks; #JJK-DEBUG
}

sub Clear_Icache
{
        %IpoolCache = ();
        $totentries = 0;
        $totlinks = 0;
        $cachesize = 0;
        $avglinks = 3;
}
                
my (%UIDcache, %GIDcache);
# Return user name corresponding to numerical UID with caching
sub UID
{
    $UIDcache{$_[0]} = getpwuid($_[0]) unless exists($UIDcache{$_[0]});
    return $UIDcache{$_[0]};
}

# Return group name corresponding to numerical GID with caching
sub GID
{
    $GIDcache{$_[0]} = getgrgid($_[0]) unless exists($GIDcache{$_[0]});
    return $GIDcache{$_[0]};
}

my (%USERcache, %GROUPcache);
# Return numerical UID corresponding to user name with caching
sub USER
{
    $USERcache{$_[0]} = getpwnam($_[0]) unless exists($USERcache{$_[0]});
    return $USERcache{$_[0]};
}

# Return numerical GUID coresponding to group name with caching
sub GROUP
{
    $GROUPcache{$_[0]} = getgrnam($_[0]) unless exists($GROUPcache{$_[0]});
    return $GROUPcache{$_[0]};
}
################################################################################
################################################################################
sub do_restore
{
        my ($restorefile) = @_;

        my $currbackup = "";

        my $formaterr = 0;
        my $ownererr = 0;
        my $permserr = 0;
        my $mkdirerr = 0;
        my $mkzeroerr = 0;
        my $mklinkerr = 0;
        my $filexsterr = 0;
        my $utimerr = 0;
        my $newdir = 0;
        my $newzero = 0;
        my $newlink = 0;
        my $skipped = 0;

################################################################################
        if($stdio) {
                open(LINKS, $gzip ? "/bin/zcat - |" : "<& STDIN");
        }else {
                open(LINKS, $gzip ? "/bin/zcat $restorefile |" : "< 
$restorefile") or
                        die "ERROR: Can't open '$restorefile' for 
reading!($!)\n";
        }

        chdir($TopDir); #Do this so we don't need to worry about distinguishing
                    #between absolute and relative (to TopDir) pathnames
        die "ERROR: pc directory contains existing backups!\n(use --Force to 
override; --FORCE to OVERWRITE)\n"
                unless $Force || $FORCE || !grep(-d, 
glob("${pcdir}*/[0-9]*/f*"));
        die "ERROR: pool directories empty! (use --Force to override)\n"
                unless $Force || glob("{cpool,pool}/*");

        umask 0000; #No permission bits disabled
        my $time = time; #We will use this for setting atimes.
        my @dirmtimes =();
        my ($line);
LINE:   while($line = <LINKS>) {
                chomp $line;

                unless($line =~ m|^[a-f0-9DZX#]|) { #First character test
                        print STDOUT "ERR_CHAR1: $line\n";
                        warn sprintf("ERROR: Illegal first line character: 
%s\n", 
                                                 substr($line,0,1))
                                if $verbose >=1;
                        next LINE;  
                }
                switch ($&) {
                        case 'D' {
                                unless($line =~ m|^D +([^ ]+) +([^ ]+) +([^ ]+) 
+([^ ]+) +(\Q${pcdir}\E.*)|) {
                                        print STDOUT "ERR_DFRMT $line\n";
                                        $formaterr++;
                                        next LINE; #NOTE: next without line 
would go to the next switch case
                                }
                                #NOTE: 1=uname 2=group 3=mode 4=mtime 5=dirpath
#                               print STDERR "$1|$2|$3|$4|$5|\n"; #JJK-DEBUG
                                my $user = USER($1);
                                my $group = GROUP($2);
                                my $mode = oct($3);
                                my $mtime = $4;
                                my $dir = $5; $dir =~ s|/*$|/|;

                                if($verbose >= 1) {
                                        $dir =~ m|pc/([^/]*/[^/]*).*|;
                                        if($1 ne $currbackup) {
                                                $currbackup = $1;
                                                warn "RESTORING: $currbackup\n";
                                        }
                                }
                                #Look at @dirtimes stack to see if we have 
backed out of
                                #top directory(ies) on stack and can now set 
mtime
                                my $lastdir; #If in new part of tree, set 
mtimes for past dirs
                                while(defined($lastdir = shift(@dirmtimes)) && 
                                          $dir !~ m|^\Q$lastdir\E|) {
                                        utime($time, shift(@dirmtimes), 
$lastdir);
                                }
                                unshift(@dirmtimes, $lastdir) if $lastdir; #Put 
back last one

                                if( -d $dir) { #Already exists, just update 
own/mod
                                        unless(chown $user, $group, $dir){
                                                print STDOUT "ERR_OWNER 
$line\n";
                                                $ownererr++;
                                                next LINE;
                                        }
                                        unless(chmod $mode, $dir) {
                                                print STDOUT "ERR_PERMS 
$line\n";
                                                $permserr++;
                                                next LINE;
                                        }
                                }elsif(! -e $dir) { #Make directory (nothing in 
the way)
                                        unless(jmake_path $dir, 
                                                   {user=>$user, group=>$group, 
mode=>$mode}) {
                                                print STDOUT "ERR_MKDIR 
$line\n";
                                                $mkdirerr++;
                                                next LINE;
                                        }
                                        $newdir++;
                                        unshift(@dirmtimes, $mtime); #We need 
to set dir mtime
                                        unshift(@dirmtimes, $dir)   #when done 
adding files to dir
                                }else { #Non-directory in the way
                                        print STDOUT "ERR_DEXST $line\n";
                                        $filexsterr++;
                                        next LINE;
                                }
                        }
                        case 'Z' {
                                unless($line =~ m|^Z +([^ ]+) +([^ ]+) +([^ ]+) 
+([^ ]+) +((\Q${pcdir}\E.*/)(.*))|) {
                                        print STDOUT "ERR_ZFRMT $line\n";
                                        $formaterr++;
                                        next LINE;
                                }
                                #NOTE: 1=uname 2=group 3=mode 4=mtime 
5=fullpath 6=dir 7=file
#                               print STDERR "$1|$2|$3|$4|$5|$6|$7\n"; 
#JJK-DEBUG
                                my $user = USER($1);
                                my $group = GROUP($2);
                                my $mode = oct($3);
                                my $mtime = $4;
                                my $file = $5;
                                my $dir = $6;
                                my $name = $7;
                                if(-e $file) {
                                        unless($FORCE && -f $file && 
unlink($file)) {
                                                print STDOUT "ERR_ZEXST 
$line\n";
                                                $filexsterr++;
                                                next LINE;
                                        }
                                }
                                unless( -d $dir ||  jmake_path $dir){ #Make 
directory if doesn't exist
                                        #NOTE: typically shouldn't be necesary 
since list is created
                                        #with directories along the way
                                        print STDOUT "ERR_MKDIR $line\n";
                                        $mkdirerr++;
                                        next LINE;
                                }
                                unless(sysopen(ZERO, $file, $CREATMASK, $mode) 
&& close(ZERO)) {
                                        print STDOUT "ERR_MKZER $line\n";
                                        $mkzeroerr++;
                                        next LINE;
                                }
                                unless(chown $user, $group, $file){
                                        print STDOUT "ERR_OWNER $line\n";
                                        $ownererr++;
                                        next LINE;
                                }
                                unless(utime $time, $mtime, $file) {
                                        $utimerr++;
                                        next LINE;
                                }
                                $newzero++;
                        }
                        case 'X' {$skipped++; next LINE; }  #Error line
                        case '#' {next LINE; }  #Comment line
                        else { #Hard link 
                        #Note: linking does not change atime/mtime (only 
ctime), so ignore
                                unless($line =~ 
m|(c?pool/[0-9a-f]/[0-9a-f]/[0-9a-f]/[^/]+) +((\Q${pcdir}\E.*/).*)|) {
                                        print STDOUT "ERR_LFRMT $line\n";
                                        $formaterr++;
                                        next LINE;
                                }
#                               print STDERR "$1|$2|$3\n"; # 1=source 2=target 
3=targetdir #JJK-DEBUG
                                my $source = $1;
                                my $target = $2;
                                my $targetdir = $3;
                                if(-e $target) {
                                        unless($FORCE && -f $target && 
unlink($target)) {
                                                print STDOUT "ERR_LEXST 
$line\n";
                                                $filexsterr++;
                                                next LINE;
                                        }
                                }
                                unless(-d $targetdir || jmake_path $targetdir){
                #Make targetdir if doesn't exist
                                #NOTE: typically shouldn't be necesary since 
list is created
                                #with directories along the way
                                        print STDOUT "ERR_MKDIR $line\n";
                                        $mkdirerr++;
                                        next LINE;
                                }
                                unless(jlink $source, $target) {
                                        print STDOUT "ERR_MKLNK $line\n";
                                        $mklinkerr++;
                                        next LINE;
                                }
                                $newlink++
                        }
                } #SWITCH
        } # WHILE reading lines

        #Set mtimes for any remaining directories in @dirmtimes stack
        while(my $dir = shift(@dirmtimes)){
                utime($time, shift(@dirmtimes), $dir);
        }
################################################################################
        #Print results:
        

        printf("\nSUCCESSES: Restores=%d [Dirs=%d Zeros=%d Links=%d] 
Skipped=%d\n", 
                   ($newdir+$newzero+$newlink), $newdir, $newzero, $newlink, 
$skipped);
        printf("ERRORS: TOTAL=%d Format=%d Mkdir=%d Mkzero=%d Mklink=%d\n",
                   ($formaterr + $mkdirerr + $mkzeroerr + $mklinkerr +
                        $filexsterr + $ownererr + $permserr + $utimerr),
                   $formaterr, $mkdirerr, $mkzeroerr, $mklinkerr);
        printf("        File_exists=%d Owner=%d Perms=%d Mtime=%d\n\n",
                   $filexsterr, $ownererr, $permserr, $utimerr);

        exit; #RESTORE DONE                
}
################################################################################
################################################################################

#!/usr/bin/perl
#============================================================= -*-perl-*-
#
# BackupPC_copyPcPool.pl: Reliably copy pool and pc tree
#
# DESCRIPTION
#   See below for detailed description of what it does and how it works
#   
# AUTHOR
#   Jeff Kosowsky
#
# COPYRIGHT
#   Copyright (C) 2011  Jeff Kosowsky
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
#========================================================================
#
# Version 0.1, released February 2011
#
#========================================================================
#
# DESCRIPTION

# 1. The program first creates a new inode-labeled pool (called by
# default 'ipool') which is a decimal tree of depth $ilevels, indexed
# by the last $ilevels significant digits of the inode of the
# corresponding pool/cpool file.  Each file in the Ipool tree
# consists of two lines. The first line is just the pool file name
# (i.e the partial file md5sum plus a potential '_N' suffix). The
# second line consists of an (optional) full file mdsum checksum,
# which is zero if we don't want to use mdsums. Optionally, the
# program will remove any orphan pool file entries (i.e. those with
# only 1 hard link) and renumber the chain accordingly (this is the same
# functionality as provided by BackupPC_nightly

# Note: the indexing is done by the least significant inode digits in
# order to ensure a more uniformly balanced tree.
# Note: the full file mdsum functionality has not yet been
# implemented.

# 2. The program then runs through the pc tree (or a designated
# subtree thereof) to create a single long list of all of the
# directories, hard links, and zero-lengthfiles
# For hard links, the entry is:
#  <path-to-pool-file> <path-to-pc-file>
# For directories, the entry is:
#    D <owner> <group> <mode> <mtime> <path-to-directory>
# For zero-length files, the entry is:
#    Z <owner> <group> <mode> <mtime> <path-to-file>
# For 'errors' (e.g., non-linked, non-zero length files), the
# entry is:
#    X <owner> <group> <mode> <mtime> <path-to-file>
# Comments can be indicated by:
#   #

# Note: all paths are relative to TopDir

# The partial file md5sum is obtained by looking up the inode of the
# pc tree file in the Ipool. Files with multiple links are optionally
# cached in memory which saves IO look-ups at the expense of
# memory. (Note the caching algorithm tries to preferentially cache
# files with the most links)

# Any (non-zero length) pc files that are not linked (i.e. only one
# hard link) or that are not linked to an Ipool file may optionally be
# corrected and linked to the pool and if this results in a new pool
# entry then that entry is added to the Ipool too. Files that are not
# corrected are listed for separate examination and manual correction
# or backup.

# The output consists of 3 files:

# <outfile>.links   is the file consisting of the list of links,
#                   directories and zero-length files described above

# <outfile>.nopool  is a listing of the normal top-level log and info
#                   files that are not linked and need to be copied
#                   over directly

# <outfile>.nolinks is a listing of the non-linked, non-zero length
#                   files that either couldn't be linked or that you
#                   decided not to link

# NOTE: Backuppc on the source machine should be disabled (or a
# snapshot used) during the entire backup process

# 3. Restoring then consists of the following steps:

#    A. Copy over the pool/cpool any way you like, including
#       rsync but without having to worry about hard links

#    B. Copy over the non-pooled files (<outfile>.nopool) any way you want,
#       e.g., rsync, tar, etc.

#    C. Run this program again using the -r|--restore flag and the
#       <outfile>.nolinks as the input

#    D. Optionally copy over the non-linked files in <outfile>.nolinks


# NOTE: Backuppc on the target machine should be disabled (or a
# snapshot used) during the entire restore process

# Selected features:
#   --gzip|-g      flag creates compressed output/input
#   --stdio        writes/reads to stdio allowing you to pipe the backup
#                  stage directly into the restore stage
#   --fixlinks|-f  fixes any missing/broken pc links
#   --topdir|-t    allows setting of alternative topdir
#   --dryrun|-d    doesn't make any changes to pool or pc tree
#   --verbose|-v   verbosity (repeat for higher verbosity)
#   --icache|-i    size of optional memory cache of ipool tree
#
#========================================================================

use strict;
use warnings;
use File::Path;
use Getopt::Long qw(:config no_ignore_case bundling);
use Fcntl;  #Required for RW I/O masks
use Switch;

use lib "/usr/share/BackupPC/lib";
use BackupPC::FileZIO;
use BackupPC::Lib;
use BackupPC::jLib 0.4.0;  # Requires version >= 0.4.0
use BackupPC::Attrib qw(:all);

no  utf8;

#use Data::Dumper; #JJK-DEBUG

select(STDERR); #Auto-flush (i.e., unbuffer) STDERR
$| = 1;

#Variables
my $bpc = BackupPC::Lib->new or die "BackupPC::Lib->new failed\n";      
my $md5 = Digest::MD5->new;
my $attr; #Re-bless for each backup since may have different compress level
my $MaxLinks = $bpc->{Conf}{HardLinkMax};
my $CREATMASK = O_WRONLY|O_CREAT|O_TRUNC;
my %IpoolCache=(); #Cache for retrieved Ipool entries
my @nonpooled = ();
my @backups = ();
my @compresslvls=(); #Pool to use for each backup;

my $directories = 0;
my $totfiles = 0; #Total of next 6 variables
my $zerolengthfiles = 0;
my $existinglink_pcfiles = 0;
my $fixedlink_pcfiles = 0;
my $unlinkable_pcfiles = 0;
my $unlinked_pcfiles = 0;
my $unlinked_nonpcfiles = 0;
my $unlinked_files = 0; #Total of above three

#GetOptions defaults:
my $IcacheSize = my $IcacheSize_def = 10000;
my $cleanpool;
my $create=0; #Default is not to create if non-empty
#$dryrun=1;  #Global variable defined in jLib.pm (do not use 'my') #JJK-DEBUG
my $fixlinks=0;
my $Force;
my $FORCE;
my $gzip;
my $ilevels = my $ilevels_def = 4; #Number of levels in the Ipool tree
my $mdsumflg;
my $outfile;
my $Overwrite;
my $paranoid;
my $pool = my $pool_def = 'both';
my $restore;
my $skippc;
my $stdio;
my $TopDir = my $TopDir_def = $bpc->{TopDir};
my $IpoolDir = my $IpoolDir_def = "ipool";
my $verbose = my $verbose_def = 2;
my $noverbose;

usage() unless( 
        GetOptions( 
                "cleanpool|c"   => \$cleanpool, #Remove orphan pool entries
                "dryrun|d!"     => \$dryrun,    #1=dry run
                "create|C!"     => \$create,    #Create new Ipool
                "fixlinks|f"    => \$fixlinks,  #Fix unlinked/broken pc files
                "Force|F"       => \$Force,     #Override stuff...
                "FORCE"         => \$FORCE,     #OVERWRITES during restore 
(dangerous)
                "icache|I=i"    => \$IcacheSize,#Size of Ipool Cache
                "ipool|i=s"     => \$IpoolDir,  #Inode Pool location relative 
to TopDir
                "gzip|g"        => \$gzip,      #Compress files
                "levels|l=i"    => \$ilevels,   #Number of levels in Inode tree
                "mdsum|m"       => \$mdsumflg,  #Include mdsum;
                "outfile|o=s"   => \$outfile,   #Output file (required)
                "Overwrite|O"   => \$Overwrite, #Overwrite existing files/dirs 
(restore)
                "pool|p=s"      => \$pool,      #Pool (pool|cpool|both)
                "paranoid|P"    => \$paranoid,  #Paranoid error checking
                "restore|r"     => \$restore,     #Restore rather than backup
                "skippc|S"      => \$skippc,    #Only create & populate Ipool
                "stdio"         => \$stdio,     #Print/read to/from stdout/stdin
                "topdir|t=s"    => \$TopDir,    #Location of TopDir
                "verbose|v+"    => \$verbose,   #Verbosity (repeats allowed)
                "noverbose"     => \$noverbose, #Shut off all verbosity
                "help|h"        => \&usage,
        ) &&
        ($create || !$cleanpool)
        );

if($restore) {
        usage() if $mdsumflg || $outfile || 
                (!$stdio && @ARGV != 1) || ($stdio && @ARGV != 0);
}else {
        usage() if ! defined($outfile) || $restore ||
                ($pool !~ /^(pool|cpool|both)/);
}

$verbose = 0 if $noverbose;

############################################################################
if($TopDir ne $TopDir_def) {
        #NOTE: if we are not using the TopDir in the config file, then we
        # need to manually override the settings of BackupPC::Lib->new
        # which *doesn't* allow you to set TopDir (even though it seems so
        # from the function definition, it gets overwritten later when the
        # config file is read)
        $TopDir =~ s|//*|/|g; #Remove any lurking double slashes
        $TopDir =~ s|/*$||g; #Remove trailing slash
        $bpc->{TopDir} = $TopDir;
        $bpc->{Conf}{TopDir} = $TopDir;

        $bpc->{storage}->setPaths({TopDir => $TopDir});
        $bpc->{PoolDir}  = "$bpc->{TopDir}/pool";
        $bpc->{CPoolDir} = "$bpc->{TopDir}/cpool";
}

%Conf   = $bpc->Conf(); #Global variable defined in jLib.pm (do not use 'my')
#############################################################################
#By convention for the rest of this program, we will assume that all
#directory variables have a trailing "/". This should save us some
#efficiency in terms of not having to always add back the slash.
$TopDir .= "/"; $TopDir =~ s|//*|/|g;
my $pcdir = 'pc/';

die "TopDir = '$TopDir' doesn't exist!\n" unless -d $TopDir;
die "TopDir = '$TopDir' doesn't look like a valid TopDir!\n" 
        unless -d "$TopDir/pool" && -d "$TopDir/cpool" && -d "$TopDir/${pcdir}";

system("$bpc->{InstallDir}/bin/BackupPC_serverMesg status jobs >/dev/null 
2>&1");
unless(($? >>8) == 1) {
        die "Dangerous to run when BackupPC is running!!! (use '--Force' to 
override)\n"
                if !$Force && $TopDir eq $TopDir_def;
        warn "WARNING: May be dangerous to run when BackupPC is running!!!\n"; 
    #Warn but don't die if *appear* to be in different TopDir
}

############################################################################
if(defined $restore) {
        do_restore($ARGV[0]);
        exit; #DONE
}
############################################################################
my $sfx = $gzip ? ".gz" : "";
my $linksfile = "${outfile}.links${sfx}";
my $nopoolfile = "${outfile}.nopool${sfx}";
my $nolinksfile = "${outfile}.nolinks${sfx}";
unless($skippc) {
        die "ERROR: '$linksfile' already exists!\n" if !$stdio && -e $linksfile;
        die "ERROR: '$nopoolfile' already exists!\n" if -e $nopoolfile;
        die "ERROR: '$nolinksfile' already exists!\n" if -e $nolinksfile;
        
        my $outpipe = $gzip ? "| gzip > " : "> ";
        if($stdio) {
                open(LINKS, $gzip ? "| gzip -f" : ">&STDOUT"); #Redirect LINKS 
to STDOUT
        }else {
                open(LINKS,  $outpipe . $linksfile)
                        or die "ERROR: Can't open '$linksfile' for 
writing!($!)\n";
        }
        
        open(NOPOOL, $outpipe . $nopoolfile)
                        or die "ERROR: Can't open '$nopoolfile' for 
writing!($!)\n";
        open(NOLINKS, $outpipe . $nolinksfile)
                        or die "ERROR: Can't open '$nolinksfile' for 
writing!($!)\n";
}

############################################################################
chdir($TopDir); #Do this so we don't need to worry about distinguishing
                #between absolute and relative (to TopDir) pathnames
                #Everything following opening the files occurs in TopDir
############################################################################
initialize_backups() unless($skippc);

$IpoolDir .= '/';
$IpoolDir  =~ s|//*|/|g;
if($create || ! -d $IpoolDir) { #Need to create Ipool...
        warn "**Creating new Ipool (${TopDir}$IpoolDir)...\n" if $verbose>=1;
        warn "* Removing old Ipool tree...\n" if $verbose >=2;
        rmtree($IpoolDir) if -d $IpoolDir; #Remove existing Ipool;
        print STDERR "* Creating Ipool directories:[0-9] " if $verbose >=2;
        create_ipool($ilevels, $IpoolDir);
        my $totipool=0;
        if($pool eq "both") {
                $totipool += populate_ipool("pool");
                $totipool += populate_ipool("cpool");
        }
        else {
                $totipool += populate_ipool($pool);
        }
        warn "*Ipool entries created=$totipool\n" if $verbose >=2;
}else { #Use existing pool
        die "ERROR: Inode tree levels ($ilevels) does not match number of 
levels in existing tree ($IpoolDir) -- either remove tree or select --create to 
override...\n"
                if ! -d $IpoolDir . ("9/" x $ilevels) || 
                -d $IpoolDir . ("9/" x ($ilevels+1));
        warn "**Using existing Ipool tree (${TopDir}$IpoolDir) [--create 
overrides]...\n" if $verbose>=1;
}
exit if $skippc;

warn "**Recording nonpooled top level files...\n" if $verbose>=1;
foreach (@nonpooled) {  #Print out the nonpooled files
        printf NOPOOL "%s\n", $_;
}
close(NOPOOL);

warn "**Recording linked & non-linked pc files...\n" if $verbose>=1;
my $lastmachine = '';
my ($compresslvl);
foreach (@backups) {
        m|$pcdir([^/]*)|;
        if($1 ne $lastmachine) {
                $lastmachine = $1;
                Clear_Icache();
        }
        warn "* Recursing through backup: $_\n" if $verbose>=1;
        $compresslvl = shift(@compresslvls);
        $attr = BackupPC::Attrib->new({ compress => $compresslvl });
    #Reinitialize this jLib global variable in case new compress level
        m|(.*/)(.*)|;
        find_cpool_links($1, $2);
}
close(LINKS);
close(NOLINKS);

##############################################################################
#Print summary & concluding message:
printf STDERR "\nDirectories=%d\tTotal Files=%d\n",
        $directories, ($totfiles + $#nonpooled);
printf STDERR "Link files=%d\t [Zero-length=%d,  Hardlinks=%d (Fixed=%d)]\n",
        ($zerolengthfiles+$existinglink_pcfiles+$fixedlink_pcfiles),
        $zerolengthfiles, ($existinglink_pcfiles+$fixedlink_pcfiles),
        $fixedlink_pcfiles;
printf STDERR "Non-pooled Toplevel=%d\n", $#nonpooled;
printf STDERR "Non-pooled Other=%d [Valid-pc=%d (Failed-fixes=%d),  
Invalid-pc=%d]\n",
        $unlinked_files, ($unlinked_pcfiles+$unlinkable_pcfiles),
        $unlinkable_pcfiles, $unlinked_nonpcfiles;

my ($rsyncnopool, $rsyncnolinks);
if($gzip) {
        $gzip="-g ";
        $rsyncnopool = "zcat $nopoolfile | rsync -aOH --files-from=- $TopDir 
<newTopDir>";
        $rsyncnolinks = "zcat $nolinksfile | rsync -aOH --files-from=- $TopDir 
<newTopDir>";
} else {
        $rsyncnopool = "rsync -aOH --files-from=$nopoolfile $TopDir 
<newTopDir>";
        $rsyncnolinks = "rsync -aOH --files-from=$nolinksfile $TopDir 
<newTopDir>";
}

print STDERR <<EOF;
------------------------------------------------------------------------------
To complete copy, do the following as user 'root' or 'backuppc':
  1. Copy/rsync over the pool & cpool directories to the new TopDir
     (note this must be done *before* restoring '$linksfile')
     For rsync:
        rsync -aO $TopDir\{pool,cpool\} <newTopDir>

  2. Copy/rsync over the non-pooled top level files ($nopoolfile)
     For rsync:
        $rsyncnopool

  3. Restore the pc directory tree, hard links and zero-sized files:
        $0 -r ${gzip}[-t <newTopDir>] $linksfile

  4. Optionally, copy/rsync over the non-linked pc files ($nolinksfile)
     For rsync:
        $rsyncnolinks

------------------------------------------------------------------------------

EOF
exit;

##############################################################################
##############################################################################
#SUBROUTINES:

sub usage
{
    print STDERR <<EOF;

usage: $0 [options] -o <outfile> [<relpath-to-pcdir> <relpath-to-pcdir> ...]
       $0 [options] -r <restorefile>

  First, if Ipool tree doesn\'t exist (or if --create option selected), 
  create a new Inode tree (default: ${TopDir_def}$IpoolDir_def).

  Then, recurse through the pool and\/or cpool directories (default: $pool_def)
  and create a new Ipool tree inode corresponding to each pool entry. The Ipool
  tree entry is indexed by the last digits of the Inode and contains the path
  relative to TopDir of the pool entry (plus an optional md5sum line --
  not yet implemented).

  Then, recurse through the paths specified relative to the pc tree or if no
  paths specified then the entire pc tree. 
  - If the file is a (non-pooled) top-level log file, then write its path
    relative to pcdir out to <outfile>.nopool
    Note this includes all non-directories above the share level plus the
    backInfo files that are covered by the input paths
  - If the file is a directory, zero length file, or an existing hard link to
    the tree, then write it out to <outfile>.links
  - If the file is not hard-linked but is a valid non-zero length pc file 
    (f-mangled and present in the attrib file) and --fixlinks is selected,
    then try to link it properly to the appropriate pool.
  - Otherwise, add it to <outfile>.nolinks

  The entries in the IpoolDir can also optionally be cached in a hash, using
  the --icache <cache size> option but the speedup is relatively minimal since
  you are just saving a one block file read

  NOTE: TO ENSURE INTEGRITY OF RESULTS IT IS IMPORTANT THAT BACKUPPC IS NOT
  RUNNING (use --Force to override)

  Note: you should run BackupPC_nightly before running this program so that
  no unnecessary links are backed up; alternatively, set the --cleanpool
  option which will remove orphan pool entries.

  Options: [Common]
   --dryrun|-d            Dry-run - doesn\'t change pool or pc trees
                          Negate with: --nodryrun
   --Force|-F             Overrides various checks (e.g., if BackupPC running
                          or if directories present)
   --FORCE                OVERWRITES during restore (DANGEROUS)
   --gzip|-g              Pipe files to/from gzip compression
   --paranoid|P           Perform extra error checking that should not be 
                          necessary assuming pool, ipool, and pc trees have
                          not changed in the interim
   --topdir|-t            Location of TopDir.
                          [Default = $TopDir_def]
                          Note you may want to change from default for example
                          if you are working on a shadow copy.
   --verbose|-v           Verbose (repeat for more verbosity)
                          [Default level = $verbose_def]
                          Use --noverbose to turn off all verbosity
   --help|-h              This help message

  Options: [Copy only]
   --cleanpool|c          If orphaned files (nlinks=1) found when populating
                          Ipool, remove them (and renumber chain as needed). 
                          NOTE: This shouldn\'t happen if you have just run
                          BackupPC_nightly
   --create|-C            Override creation of new Ipool even if already present
                          Negate with: --nocreate
   --fixlinks|-f          Attempt to link valid pc files back to the pool
                          if not already hard-linked
                          NOTE: this changes files in the pc and\/or pool
                          of your source too!
   --icache|I N           Size of Ipool *memory* cache (0 = off)
                          [Default = $IcacheSize_def]
   --ipool|i [location]   Location relative to TopDir of Ipool tree
                          [Default = $IpoolDir_def]
   --levels|l N           Number of levels in the Ipool tree
                          [Default = $ilevels_def]
   --mdsum|m              Include mdsums [NOT IMPLEMENTED]
   --outfile|-o [outfile] Required stem name for the 3 output files
                             <outfile>.nopools
                             <outfile>.links
                             <outfile>.nolinks
   --pool|-p  [pool|cpool|both]  Pools to include in Ipool tree
   --skippc|-S            Skip recursing pc directory (implies create pool)
   --stdio                Print the directory tree, links and zero-sized files
                          to stdout so it can be piped directly to another copy
                          of the program running --restore
                          NOTE: Status, Warnings, and Errors are printed to 
                          stdout.

  Options: [Restore only]
   --Overwrite|-O         Overwrite existing files & directories
   --stdio                Read the directory tree, links and zero-sized files
                          from stdin so it can be piped directly from another
                          copy of the program running in the create mode
                          For example, the following pipe works:
                          $0 -t <source TopDir> [-g] --stdio | $0 -t <dest 
TopDir> [-g] -r --stdio

EOF
exit(1);
}

#Glob to determine what is being backed up. 
#Make sure each backup seems valid and determine it's compress level
#Collect top-level non-pooled files
sub initialize_backups
{
        if (!@ARGV) { # All backups at backup number level
                # TopDir/pc/<host>/<nn>
                @backups = glob("${pcdir}*/*"); #2 levels down;
                @nonpooled = grep(! -d , glob("${pcdir}*")); #All 
non-directories
        } else { # Subset of backups
                foreach(@ARGV) {
                        my $backupdir = $pcdir . $_ . '/';
                        $backupdir  =~ s|//*|/|g;
                        die "ERROR: '$backupdir' is not a directory\n" unless 
-d $backupdir;
                        if($backupdir =~ m|^\Q${pcdir}\E[^/]+/$|) {  #Hostname 
only
                                push(@backups, glob("${backupdir}*"));
                        } else { # At share level or below #Backup number or 
below
                                push(@backups, ${backupdir});
                        }
                }
                @backups = keys %{{map {$_ => 1} @backups}}; #Eliminate dups
                @nonpooled = keys %{{map {$_ => 1} @nonpooled}}; #Eliminate dups
        }

        push(@nonpooled, grep(! -d, @backups)); #Non-directories
        @backups = grep(-d , @backups); #Directories
        push(@nonpooled, grep(m|/backupInfo$|, @backups)); # backupInfo not 
pooled
    #Note everything else *should* be pooled - if not, we will catch it later 
as an error
        @backups = grep(!m|/backupInfo$|, @backups);

        foreach(@backups) {
                s|/*$||; #Remove trailing slash
                m|^(\Q${pcdir}\E[^/]+/[^/]+)(/)?|;
                die "Backup '$1' does not contain a 'backupInfo' file\n"
                        unless -f "$1/backupInfo";
                push(@nonpooled, "$1/backupInfo") unless $2; #Don't include if 
share level or below
                my $compress = get_bakinfo("$1","compress");
                push(@compresslvls, $compress);
                my $thepool = $compress > 0 ? 'cpool' : 'pool';
                die "Backup '$1' links to non-included '$thepool'\n"
                        if $pool ne 'both' && $pool ne $thepool;
        }
}

#Recursively create directory tree for Ipool - $levels deep starting from
#$newdir. Note: $newdir should have a trailing slash
sub create_ipool
{
        my ($level, $newdir) = @_;
#       print STDERR "$level: $newdir\n"; #JJK-DEBUG
        unless(-d $newdir) {
                mkdir $newdir or die "Can't create directory: $newdir\n";
        }
        if($level--) {
                for(my $i=0; $i <= 9; $i++) {
                        print STDERR "$i " if $level==$ilevels-1 && $verbose 
>=2;
                        create_ipool($level, $newdir . $i . "/"); #Recurse
                }
        }
}

#Iterate through the 'pooldir' tree and populate the Ipool tree
sub populate_ipool
{
        my ($fpool) = @_;
        my (@fstat, $dh, @dirlist, $pfile);
        my $ipoolfiles = 0;

        return unless glob("$fpool/[0-9a-f]"); #No entries in pool
        my @hexlist = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
'b', 'c', 'd', 'e', 'f');
        my ($idir,$jdir,$kdir);
        print STDERR "\n* Populating Ipool with '$fpool' branch:[0-f] " if 
$verbose >=2;
        foreach my $i (@hexlist) {
        print STDERR "$i " if $verbose >=2;
                $idir = $fpool . "/" . $i . "/";
                foreach my $j (@hexlist) {
                        $jdir = $idir . $j . "/";
                        foreach my $k (@hexlist) {
                                $kdir = $jdir . $k . "/";
                                unless(opendir($dh, $kdir)) {
                                        warn "Can't open pool directory: 
$kdir\n" if $verbose>=4;
                                        next;
                                }
                                my @entries = grep(!/^\.\.?$/, readdir ($dh)); 
#Remove dot files
                                closedir($dh);
                                warn "POOLDIR: $kdir (" . ($#entries+1) ." 
files)\n"
                                        if $verbose >=3;
                                if($cleanpool) { #Remove orphans & renumber 
first and then
                                        #create the Ipool after orphan deletion 
& chain renumbering
                                        my @poolorphans = 
                                                grep(-f $kdir . $_ && 
(stat(_))[3] < 2, @entries);
                                        foreach (sort {poolname2number($b) cmp 
poolname2number($a)} 
                                                         @poolorphans) { 
#Reverse sort to minimize moves
                                                $pfile = $kdir . $_;
                                                my $res 
=delete_pool_file($pfile);
                                                if($verbose >=1) {
                                                        $res == 1 ?
                                                                warn "WARN: 
Deleted orphan pool entry: $pfile\n": 
                                                                warn "ERROR: 
Couldn't properly delete orphan pool entry: $pfile\n"
                                                }
                                        }
                                }
                                foreach (@entries) {
                                        # Go through all entries in terminal 
cpool branch
                                        $pfile = $kdir . $_;
                                        next unless -f $pfile;
                                        @fstat = stat(_);
                                        #Note: 0=dev, 1=ino, 2=mode, 3=nlink, 
4=uid, 5=gid, 6=rdev
                                        #7=size, 8=atime, 9=mtime, 10=ctime, 
11=blksize, 12=blocks
                                        if($fstat[3] < 2) { #Orphan pool entry
                                                warn "WARN: Orphan pool entry: 
$pfile\n" 
                                                        if $verbose>=1;
                                        } else { #No sense in creating ipool 
entries for orphans
                                                create_ipool_file($fstat[1], 
$pfile);
                                                $ipoolfiles++;
                                        }
                                }
                        } # kdir
                } # jdir
        } # idir
        print STDERR "\n" if $verbose >=2;
        return $ipoolfiles;
}

sub create_ipool_file
{
        my ($inode, $poolname) = @_;
        my ($fh, $ifile);
        
        my $mdsum = 0;
        $mdsum = get_mdsum($poolname) if $mdsumflg;
        $ifile = InodePath($inode);
#    print STDERR "IFILE: $ifile\n"; #JJK-DEBUG
        if(sysopen($fh, $ifile, $CREATMASK)) {
                syswrite($fh, "$poolname\n$mdsum\n");
                close($fh);
        }
        else {
                warn "ERROR: Can't write to inode pool file: $ifile\n";
        }
}

#Return the Ipool location of inode
#Note: each inode is stored based on the $ilevels least significant digits
#so that the tree should be relatively balanced
#Note this is analogous to MD52Path
sub InodePath
{
        my ($inode, $ipooldir) = @_;

        my $ipath= $inode;
        for(my $i=0; $i < $ilevels; $i++) {
                $ipath = ($inode%10) . '/' . $ipath;
                $inode /=10;
        }
        return((defined $ipooldir ? $ipooldir : $IpoolDir) . $ipath);

}

#Recursively go through pc tree
sub find_cpool_links
{
        my ($dir, $filename) = @_;

        my ($fh, $dh);
        my $file = $dir . $filename;
        if(-d $file) {
                my @fstat = stat(_); #Note last stat was -d $file
                #Print info to signal & re-create directory:
                #D <user> <group> <mod> <atime> <mtime> <file name>
                print_file_reference("D", $file, \@fstat);
                $directories++;
                opendir($dh,$file);
                my @contents = readdir($dh);
                foreach (@contents) {
                        next if /^\.\.?$/;     # skip dot files (. and ..)
                        find_cpool_links($file . '/', $_); #Recurse
                }
        }
        else { #Not a directory
                my @fstat = stat(_);
                $totfiles++;
                if($fstat[7] == 0) { #Zero size file
                        print_file_reference("Z", $file, \@fstat);
                        $zerolengthfiles++;
                        return;
                }elsif($fstat[3] > 1) { #More than one link
                        my ($pooltarget, $ifile);
                        if($IcacheSize && ($pooltarget = 
$IpoolCache{$fstat[1]})) {
#                               print STDERR "Cache hit: $fstat[1]\n"; 
#JJK-DEBUG
                        }elsif(open($fh, '<', ($ifile = InodePath($fstat[1])) 
)) {
                                if ($pooltarget = <$fh>) {
                                        chomp($pooltarget);
                                        close($fh);
                                } else { #Shouldn't happen...
                                        warn "ERROR: Ipool file '$ifile' has no 
data\n";
                                        $pooltarget = undef;
                                }
                        }
                        if($pooltarget) {  #Hard link found in Cache or Ipool
                           if($paranoid &&
                                  ($pooltarget !~  #Not a standard pool name
                                   
m|^(c?pool/[0-9a-f]/[0-9a-f]/[0-9a-f]/[0-9a-f]{32}(_[0-9]+)?)$|
                                   ||  ! -f $pooltarget  #Target pool file 
doesn't exist
                                   || $fstat[1] != (stat(_))[1])) { #Inodes 
don't match
                                   delete $IpoolCache{$fstat[1]};
                                   warn "ERROR: Ipool file '$ifile' for file 
'$file' has bad entry: $pooltarget\n";
                           }else { #Valid hit
                                   $existinglink_pcfiles++;

                                   Add_Icache ($pooltarget, $fstat[1], 
$fstat[3])
                                           if $IcacheSize;
                                   print_hardlink_reference($pooltarget, $file);
                                   return;
                           }
                        }
                }

                if($file =~ m|^\Q${pcdir}\E[^/]+/[^/]+/backupInfo$|) {
                        $totfiles--;
                        return;         #BackupInfo already taken care of in 
'nonpooled'
                }

                #Non-zero sized file not in Ipool/Icache that is not 
'backupInfo'
                if($filename =~ /^f/ && -f $file && -f "$dir/attrib" 
                   && $attr->read($dir,"attrib") == 1
                   && defined($attr->get($bpc->fileNameUnmangle($filename)))) {
                        #VALID pc file if f-mangled, regular file , attrib file 
exists,
                        #and unmangled name is an element of the attrib file
                        if($fixlinks) {
                                if(fix_link($file, \@fstat) == 1) {
                                        $fixedlink_pcfiles++;
                                        return;
                                }else {$unlinkable_pcfiles++;}
                        }else{
                                warn "ERROR: $file (inode=$fstat[1], 
nlinks=$fstat[3]) VALID pc file NOT LINKED to pool\n";
                                $unlinked_pcfiles++;
                        }
                }else {
                        warn "ERROR: $file (inode=$fstat[1], nlinks=$fstat[3]) 
INVALID pc file and UNLINKED to pool\n";
                        $unlinked_nonpcfiles++
                }
                #ERROR: Not linked/linkable to pool (and not zero length file 
or directory
                print_file_reference("X", $file, \@fstat);
                printf NOLINKS "%s\n", $file;
                $unlinked_files++
        }
}

#Try to fix link by finding/creating new pool-link
sub fix_link
{
        my ($filename, $fstatptr) = @_;

        my $poollink = undef;
        my ($md5sum, $result) = zFile2MD5($bpc, $md5, $filename, 0, 
$compresslvl);
        $result = jMakeFileLink($bpc, $filename, $md5sum, 2, $compresslvl, 
\$poollink)
                if $result > 0;
        #Note we choose NewFile=2 since if we are fixing, we always want to 
make the link
        if($result > 0) { #(true if both above calls succeed)
                my $pool = ($compresslvl > 0 ? "cpool" : "pool");
                $poollink =~ m|.*(${pool}.*)|;
                $poollink = $1; #Relative to TopDir
                print_hardlink_reference($poollink, $filename);
                create_ipool_file((stat($poollink))[1], $poollink); #Update 
Ipool
                if($verbose >=2) {
                        warn "NOTICE: pool entry '$poollink' 
(inode=$fstatptr->[1]) missing from Ipool and added back\n" if $result == 3;
                        warn sprintf("NOTICE: '%s' (inode=%d, nlinks=%d) was 
fixed by linking to *%s* pool entry: %s (inode=%d, nlinks=%d)\n", 
                                                 $filename, $fstatptr->[1], 
$fstatptr->[3],
                                                 ($result == 1 ? "existing" : 
"new"), 
                                                 $poollink, (stat(_))[1], 
(stat(_))[3])
                                if $result !=3;
                }
                return 1;
        }
        warn "ERROR: '$filename' (inode $fstatptr->[1]); fstatptr->[3] links; 
md5sum=$md5sum) VALID pc file FAILED FIXING by linking to pool\n";
        return 0;
}

sub print_hardlink_reference
{
        my ($poolname, $filename) = @_;
        
        printf LINKS "%s %s\n", $poolname, $filename; #Print hard link
}

sub print_file_reference
{
        my ($firstcol, $filename, $fstatptr) = @_;

        #<firstcol>  <user> <group> <mod> <mtime> <filename>
        printf(LINKS "%s %s %s %04o %u %s\n",
                   $firstcol, UID($fstatptr->[4]), GID($fstatptr->[5]), 
                   $fstatptr->[2] & 07777, $fstatptr->[9], $filename);
}


sub get_mdsum
{
}

###############################################################################
my ($avglinks, $totentries, $totlinks, $cachesize);
sub Add_Icache
{
        my ($target, $inode, $nlinks) = @_;
        my $fraction = 1/2;

#       printf STDERR "(size=%d|%d)\n", $cachesize, $size2; #JJK-DEBUG
        return if defined $IpoolCache{$inode} || #Already stored
                $nlinks <= 2 || #Only add new keys if nlinks >2 links 
                $nlinks < $fraction * $avglinks; #And if > fraction * average

        if($cachesize >= $IcacheSize) { #Delete "random" key if full
                my $key = each %IpoolCache || each %IpoolCache;
                #Note repetition needed to swallow the 'undef' token and allow 
'each to circle around again (also in scalar context note each returns just the 
key)
                delete($IpoolCache{$key});
                $cachesize--;
#               printf STDERR "Deleting cache link (size=%d, avg=%d)\n", 
$cachesize, $avglinks; #JJK-DEBUG
        }
        $IpoolCache{$inode} = $target;
        $totentries++;
        $totlinks += $nlinks;
        $cachesize++;
        $avglinks = $totlinks/$totentries;
#       printf STDERR "Adding cache link (inode=$inode nlinks=%d) [size=%d, 
avg=%.1f]\n", $nlinks, $cachesize, $avglinks; #JJK-DEBUG
}

sub Clear_Icache
{
        %IpoolCache = ();
        $totentries = 0;
        $totlinks = 0;
        $cachesize = 0;
        $avglinks = 3;
}
                
my (%UIDcache, %GIDcache);
# Return user name corresponding to numerical UID with caching
sub UID
{
    $UIDcache{$_[0]} = getpwuid($_[0]) unless exists($UIDcache{$_[0]});
    return $UIDcache{$_[0]};
}

# Return group name corresponding to numerical GID with caching
sub GID
{
    $GIDcache{$_[0]} = getgrgid($_[0]) unless exists($GIDcache{$_[0]});
    return $GIDcache{$_[0]};
}

my (%USERcache, %GROUPcache);
# Return numerical UID corresponding to user name with caching
sub USER
{
    $USERcache{$_[0]} = getpwnam($_[0]) unless exists($USERcache{$_[0]});
    return $USERcache{$_[0]};
}

# Return numerical GUID coresponding to group name with caching
sub GROUP
{
    $GROUPcache{$_[0]} = getgrnam($_[0]) unless exists($GROUPcache{$_[0]});
    return $GROUPcache{$_[0]};
}
################################################################################
################################################################################
sub do_restore
{
        my ($restorefile) = @_;

        my $currbackup = "";

        my $formaterr = 0;
        my $ownererr = 0;
        my $permserr = 0;
        my $mkdirerr = 0;
        my $mkzeroerr = 0;
        my $mklinkerr = 0;
        my $filexsterr = 0;
        my $utimerr = 0;
        my $newdir = 0;
        my $newzero = 0;
        my $newlink = 0;
        my $skipped = 0;

################################################################################
        if($stdio) {
                open(LINKS, $gzip ? "/bin/zcat - |" : "<& STDIN");
        }else {
                open(LINKS, $gzip ? "/bin/zcat $restorefile |" : "< 
$restorefile") or
                        die "ERROR: Can't open '$restorefile' for 
reading!($!)\n";
        }

        chdir($TopDir); #Do this so we don't need to worry about distinguishing
                    #between absolute and relative (to TopDir) pathnames
        die "ERROR: pc directory contains existing backups!\n(use --Force to 
override; --FORCE to OVERWRITE)\n"
                unless $Force || $FORCE || !grep(-d, 
glob("${pcdir}*/[0-9]*/f*"));
        die "ERROR: pool directories empty! (use --Force to override)\n"
                unless $Force || glob("{cpool,pool}/*");

        umask 0000; #No permission bits disabled
        my $time = time; #We will use this for setting atimes.
        my @dirmtimes =();
        my ($line);
LINE:   while($line = <LINKS>) {
                chomp $line;

                unless($line =~ m|^[a-f0-9DZX#]|) { #First character test
                        print STDOUT "ERR_CHAR1: $line\n";
                        warn sprintf("ERROR: Illegal first line character: 
%s\n", 
                                                 substr($line,0,1))
                                if $verbose >=1;
                        next LINE;  
                }
                switch ($&) {
                        case 'D' {
                                unless($line =~ m|^D +([^ ]+) +([^ ]+) +([^ ]+) 
+([^ ]+) +(\Q${pcdir}\E.*)|) {
                                        print STDOUT "ERR_DFRMT $line\n";
                                        $formaterr++;
                                        next LINE; #NOTE: next without line 
would go to the next switch case
                                }
                                #NOTE: 1=uname 2=group 3=mode 4=mtime 5=dirpath
#                               print STDERR "$1|$2|$3|$4|$5|\n"; #JJK-DEBUG
                                my $user = USER($1);
                                my $group = GROUP($2);
                                my $mode = oct($3);
                                my $mtime = $4;
                                my $dir = $5; $dir =~ s|/*$|/|;

                                if($verbose >= 1) {
                                        $dir =~ m|pc/([^/]*/[^/]*).*|;
                                        if($1 ne $currbackup) {
                                                $currbackup = $1;
                                                warn "RESTORING: $currbackup\n";
                                        }
                                }
                                #Look at @dirtimes stack to see if we have 
backed out of
                                #top directory(ies) on stack and can now set 
mtime
                                my $lastdir; #If in new part of tree, set 
mtimes for past dirs
                                while(defined($lastdir = shift(@dirmtimes)) && 
                                          $dir !~ m|^\Q$lastdir\E|) {
                                        utime($time, shift(@dirmtimes), 
$lastdir);
                                }
                                unshift(@dirmtimes, $lastdir) if $lastdir; #Put 
back last one

                                if( -d $dir) { #Already exists, just update 
own/mod
                                        unless(chown $user, $group, $dir){
                                                print STDOUT "ERR_OWNER 
$line\n";
                                                $ownererr++;
                                                next LINE;
                                        }
                                        unless(chmod $mode, $dir) {
                                                print STDOUT "ERR_PERMS 
$line\n";
                                                $permserr++;
                                                next LINE;
                                        }
                                }elsif(! -e $dir) { #Make directory (nothing in 
the way)
                                        unless(jmake_path $dir, 
                                                   {user=>$user, group=>$group, 
mode=>$mode}) {
                                                print STDOUT "ERR_MKDIR 
$line\n";
                                                $mkdirerr++;
                                                next LINE;
                                        }
                                        $newdir++;
                                        unshift(@dirmtimes, $mtime); #We need 
to set dir mtime
                                        unshift(@dirmtimes, $dir)   #when done 
adding files to dir
                                }else { #Non-directory in the way
                                        print STDOUT "ERR_DEXST $line\n";
                                        $filexsterr++;
                                        next LINE;
                                }
                        }
                        case 'Z' {
                                unless($line =~ m|^Z +([^ ]+) +([^ ]+) +([^ ]+) 
+([^ ]+) +((\Q${pcdir}\E.*/)(.*))|) {
                                        print STDOUT "ERR_ZFRMT $line\n";
                                        $formaterr++;
                                        next LINE;
                                }
                                #NOTE: 1=uname 2=group 3=mode 4=mtime 
5=fullpath 6=dir 7=file
#                               print STDERR "$1|$2|$3|$4|$5|$6|$7\n"; 
#JJK-DEBUG
                                my $user = USER($1);
                                my $group = GROUP($2);
                                my $mode = oct($3);
                                my $mtime = $4;
                                my $file = $5;
                                my $dir = $6;
                                my $name = $7;
                                if(-e $file) {
                                        unless($FORCE && -f $file && 
unlink($file)) {
                                                print STDOUT "ERR_ZEXST 
$line\n";
                                                $filexsterr++;
                                                next LINE;
                                        }
                                }
                                unless( -d $dir ||  jmake_path $dir){ #Make 
directory if doesn't exist
                                        #NOTE: typically shouldn't be necesary 
since list is created
                                        #with directories along the way
                                        print STDOUT "ERR_MKDIR $line\n";
                                        $mkdirerr++;
                                        next LINE;
                                }
                                unless(sysopen(ZERO, $file, $CREATMASK, $mode) 
&& close(ZERO)) {
                                        print STDOUT "ERR_MKZER $line\n";
                                        $mkzeroerr++;
                                        next LINE;
                                }
                                unless(chown $user, $group, $file){
                                        print STDOUT "ERR_OWNER $line\n";
                                        $ownererr++;
                                        next LINE;
                                }
                                unless(utime $time, $mtime, $file) {
                                        $utimerr++;
                                        next LINE;
                                }
                                $newzero++;
                        }
                        case 'X' {$skipped++; next LINE; }  #Error line
                        case '#' {next LINE; }  #Comment line
                        else { #Hard link 
                        #Note: linking does not change atime/mtime (only 
ctime), so ignore
                                unless($line =~ 
m|(c?pool/[0-9a-f]/[0-9a-f]/[0-9a-f]/[^/]+) +((\Q${pcdir}\E.*/).*)|) {
                                        print STDOUT "ERR_LFRMT $line\n";
                                        $formaterr++;
                                        next LINE;
                                }
#                               print STDERR "$1|$2|$3\n"; # 1=source 2=target 
3=targetdir #JJK-DEBUG
                                my $source = $1;
                                my $target = $2;
                                my $targetdir = $3;
                                if(-e $target) {
                                        unless($FORCE && -f $target && 
unlink($target)) {
                                                print STDOUT "ERR_LEXST 
$line\n";
                                                $filexsterr++;
                                                next LINE;
                                        }
                                }
                                unless(-d $targetdir || jmake_path $targetdir){
                #Make targetdir if doesn't exist
                                #NOTE: typically shouldn't be necesary since 
list is created
                                #with directories along the way
                                        print STDOUT "ERR_MKDIR $line\n";
                                        $mkdirerr++;
                                        next LINE;
                                }
                                unless(jlink $source, $target) {
                                        print STDOUT "ERR_MKLNK $line\n";
                                        $mklinkerr++;
                                        next LINE;
                                }
                                $newlink++
                        }
                } #SWITCH
        } # WHILE reading lines

        #Set mtimes for any remaining directories in @dirmtimes stack
        while(my $dir = shift(@dirmtimes)){
                utime($time, shift(@dirmtimes), $dir);
        }
################################################################################
        #Print results:
        

        printf("\nSUCCESSES: Restores=%d [Dirs=%d Zeros=%d Links=%d] 
Skipped=%d\n", 
                   ($newdir+$newzero+$newlink), $newdir, $newzero, $newlink, 
$skipped);
        printf("ERRORS: TOTAL=%d Format=%d Mkdir=%d Mkzero=%d Mklink=%d\n",
                   ($formaterr + $mkdirerr + $mkzeroerr + $mklinkerr +
                        $filexsterr + $ownererr + $permserr + $utimerr),
                   $formaterr, $mkdirerr, $mkzeroerr, $mklinkerr);
        printf("        File_exists=%d Owner=%d Perms=%d Mtime=%d\n\n",
                   $filexsterr, $ownererr, $permserr, $utimerr);

        exit; #RESTORE DONE                
}
################################################################################
################################################################################


------------------------------------------------------------------------------
jLib.pm
------------------------------------------------------------------------------
#============================================================= -*-perl-*-
#
# BackupPC::jLib package
#
# DESCRIPTION
#
#   This library includes various supporting subroutines for use with BackupPC
#   functions used by BackupPC.
#
# AUTHOR
#   Jeff Kosowsky
#
# COPYRIGHT
#   Copyright (C) 2008-2011  Jeff Kosowsky
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
#========================================================================
#
# Version 0.4.0, released January 2011
#
#========================================================================

package BackupPC::jLib;

use strict;
use vars qw($VERSION);
$VERSION = '0.4.0';

use warnings;
use File::Copy;
use File::Path;
use File::Temp;
use Fcntl;  #Required for RW I/O masks

use BackupPC::Lib;
use BackupPC::Attrib;
use BackupPC::FileZIO;
use Data::Dumper;  #Just used for debugging...

no utf8;

use constant _128KB               => 131072;
use constant _1MB                 => 1048576;
use constant LINUX_BLOCK_SIZE     => 4096;
use constant TOO_BIG              => 2097152; # 1024*1024*2 (2MB)

require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(
       %Conf $dryrun $errorcount
       LINUX_BLOCK_SIZE TOO_BIG
       printerr warnerr firstbyte
       zFile2MD5 zFile2FullMD5
       link_recursively_topool
       jfileNameUnmangle
       getattr read_attrib count_file_attribs
       get_bakinfo get_file_attrib get_attrib_value get_attrib_type
       write_attrib write_file_attrib
       attrib
       GetPoolLink jMakeFileLink
       poolname2number renumber_pool_chain delete_pool_file
       run_nightly
       jcompare zcompare zcompare2
       delete_files touch
       jcopy jlink junlink jmkdir jmkpath jmake_path jrename jrmtree);

#Global variables
our %Conf;
our $errorcount=0;
our $dryrun=1;  #global variable - set to 1 to be safe -- should be set to
                                #0 in actual program files if you don't want a 
dry-run
                #None of the routines below will change/delete/write actual
                #file data if this flag is set. The goal is to do everything but
                #the final write to assist with debugging or cold feet :)

sub printerr
{
        print "ERROR: " . $_[0];
        $errorcount++;
}

sub warnerr
{
        $|++; # flush printbuffer first
        warn "ERROR: " . $_[0];
        $errorcount++;
}

# Returns the firstbyte of a file.
# If coding $coding undefined or 0, return as unpacked 2 char hexadecimal
# string. Otherwise, return as binary byte.
# Return -1 on error.
# Useful for checking the type of compressed file/checksum coding.
sub firstbyte {
        my ($file, $coding) = @_;
        my $fbyte='';
        sysopen(my $fh, $file, O_RDONLY) || return -1;
        $fbyte = -1 unless sysread($fh, $fbyte, 1) == 1;
        close($fh);
        if (! defined($coding) || $coding == 0) {
                $fbyte = unpack('H*',$fbyte); # Unpack as 2 char hexadecimal 
string
        }
        else {
                $fbyte = vec($fbyte, 0, 8);  # Return as binary byte
        }
        return $fbyte;
}

# Compute the MD5 digest of a compressed file. This is the compressed
# file version of the Lib.pm function File2MD5.
# For efficiency we don't use the whole file for big files
#   - for files <= 256K we use the file size and the whole file.
#   - for files <= 1M we use the file size, the first 128K and
#     the last 128K.
#   - for files > 1M, we use the file size, the first 128K and
#     the 8th 128K (ie: the 128K up to 1MB).
# See the documentation for a discussion of the tradeoffs in
# how much data we use and how many collisions we get.
#
# Returns the MD5 digest (a hex string) and the file size if suceeeds.
# (or "" and error code if fails).
# Note return for a zero size file is ("", 0).
#
# If $size < 0 then calculate size of file by fully decompressing
# If $size = 0 then first try to read corresponding attrib file
#    (if it exists), if doesn't work then calculate by fully decompressing
# IF $size >0 then use that as the size of the file
#
# If compreslvl is undefined then use the default compress level from 
# the config file

sub zFile2MD5
{
    my($bpc, $md5, $name, $size, $compresslvl) = @_;
        
        my ($fh, $rsize, $filesize, $md5size);

        return ("", -1) unless -f $name;
        return ("", 0) if (stat(_))[7] == 0;  #Zero size file
        $compresslvl = $Conf{CompressLevel} unless defined $compresslvl;
        unless (defined ($fh = BackupPC::FileZIO->open($name, 0, 
$compresslvl))) {
                printerr "Can't open $name\n";
                return ("", -1);
        }

        my ($datafirst, $datalast);
        my @data;
        #First try to read up to the first 128K (131072 bytes)
        if ( ($md5size = $fh->read(\$datafirst, _128KB)) < 0 ) { #Fist 128K
                printerr "Can't read & decompress $name\n";
                return ("", -1);
        }

        if ($md5size == _128KB) { # If full read, continue reading up to 1st MB
                my $i=0;
                #Read in up to 1MB (_1MB), 128K at a time and alternate between 
2 data buffers
                while ( ($rsize = $fh->read(\$data[(++$i)%2], _128KB)) == _128KB
                                &&  ($md5size += $rsize) < _1MB ) {}
                $md5size +=$rsize if $rsize < _128KB; # Add back in partial read
            $datalast = ($i > 1 ? 
                                         substr($data[($i-1)%2], $rsize, 
_128KB-$rsize) : '')
                        . substr($data[$i%2], 0 ,$rsize); #Last 128KB (up to 
1MB)
    }
        if($md5size < _1MB) { #Already know the size because read it all (note 
don't do <=)
                $filesize = $md5size;
        } elsif($size > 0) {  #Use given size
                $filesize = $size;
        } elsif($compresslvl == 0) { #Not compressed, so: size = actual size
                $filesize = -s $name;
        }elsif($size == 0) { # Try to find size from attrib file
                $filesize = get_attrib_value($name, "size");
                if(!defined($filesize)) {
                        warn "Can't read size of $name from attrib file so 
calculating manually\n";
                }
        }
        if(!defined($filesize)) { #No choice but continue reading to find size
                $filesize = $md5size;
                while (($rsize = $fh->read(\($data[0]), _128KB)) > 0) {
                        $filesize +=$rsize;
        }
   }
   $fh->close();

   $md5->reset();
   $md5->add($filesize);
   $md5->add($datafirst);
   $md5->add($datalast) if defined($datalast);
   return ($md5->hexdigest, $filesize);
}

#Compute md5sum of the full data contents of a file.
#If the file is compressed, calculate the md5sum of the inflated
#version (using the zlib routines to uncompress the stream). Note this
#gives the md5sum of the FULL file -- not just the partial file md5sum
#above.
sub zFile2FullMD5
{
    my($bpc, $md5, $name, $compresslvl) = @_;

        my $fh;
        my $data;

        $compresslvl = $Conf{CompressLevel} unless defined $compresslvl;
        unless (defined ($fh = BackupPC::FileZIO->open($name, 0, 
$compresslvl))) {
                printerr "Can't open $name\n";
                return -1;
        }

        $md5->reset();  
        while ($fh->read(\$data, 65536) > 0) {
                $md5->add($data);
        }

    return $md5->hexdigest;
}


# Like MakeFileLink but for existing files where we don't have the
# digest available. So compute digest and call MakeFileLink after
# For each file, check if the file exists in $bpc->{TopDir}/pool.
# If so, remove the file and make a hardlink to the file in
# the pool.  Otherwise, if the newFile flag is set, make a
# hardlink in the pool to the new file.
#
# Returns 0 if a link should be made to a new file (ie: when the file
#    is a new file but the newFile flag is 0).
# Returns 1 if a link to an existing file is made,
# Returns 2 if a link to a new file is made (only if $newFile is set)
# Returns negative on error.
sub zMakeFileLink
{
    my($bpc, $md5, $name, $newFile, $compress) = @_;

        $compress = $Conf{CompressLevel} unless defined $compress;

        my ($d,$ret) = zFile2MD5($bpc, $md5, $name, 0, $compress);
        return -5 if $ret < 0;
        $bpc->MakeFileLink($name, $d, $newFile, $compress);
}


# Use the following to create a new file link ($copy) that links to
# the same pool file as the original ($orig) file does (or
# should). i.e. this doesn't assume that $orig is properly linked to
# the pool. This is computationally more costly than just making the
# link, but will avoid further corruption whereby you get archive
# files with multiple links to each other but no link to pool.

# First, check if $orig links to the pool and if not create a link
# via MakeFileLink.  Don't create a new pool entry if newFile is zero.
# If a link either already exists from the original to the pool or if
# one was successfully created, then simply link $copy to the same
# pool entry.  Otherwise, just copy (don't link) $orig to $copy
# and leave it unlinked to the pool.  

# Note that checking whether $orig is linked to the pool is
# cheaper than running MakeFileLink since we only need the md5sum
# checksum.
# Note we assume that the attrib entry for $orig (if it exists) is
# correct, since we use that as a shortcut to determine the filesize 
# (via zFile2MD5)
# Returns 1 if a link to an existing file is made,
# Returns 2 if a link to a new file is made (only if $newFile is set)
# Returns 0 if file was copied (either because MakeFileLink failed or 
#   because newFile=0 and no existing pool match
# Returns negative on error.
sub zCopyFileLink 
{
        my ($bpc, $orig, $copy, $newFile, $compress) = @_;
        my $ret=1;
        $compress = $Conf{CompressLevel} unless defined $compress;
        my $md5 = Digest::MD5->new;
        my ($md5sum, $md5ret) = zFile2MD5($bpc, $md5, $orig, 0, $compress);

    #If $orig is already properly linked to the pool (or linkable to pool after 
running 
        #MakeFileLink on $orig) and HardLinkMax is not exceeded, then just link 
to $orig.
        if($md5ret > 0) { #If valid md5sum and non-zero length file (so should 
be linked to pool)...
                if((GetPoolLinkMD5($bpc, $orig, $md5sum, $compress, 0) == 1 || 
#If pool link already exists
                        ($ret = $bpc->MakeFileLink($orig, $md5sum, $newFile, 
$compress))> 0) #or creatable by MakeFileLink
                   && (stat($orig))[3] < $bpc->{Conf}{HardLinkMax}) {     # AND 
(still) less than max links
                        return $ret if link($orig, $copy); # Then link from 
copy to orig
                }
        }
        if(copy($orig, $copy) == 1) { #Otherwise first copy file then try to 
link the copy to pool
                if($md5ret > 0 && ($ret = $bpc->MakeFileLink($copy, $md5sum, 
$newFile, $compress))> 0) {
                        return 2; #Link is to a new copy
                }
                printerr "Failed to link copied file to pool: $copy\n"; 
                return 0; #Copy but not linked
        }
        die "Failed to link or copy file: $copy\n";     
        return -1;
}

# Copy $source to $target, recursing down directory trees as
# needed. The 4th argument if non-zero, means (for files) use
# zCopyFileLink to make sure that everything is linked properly to the
# pool; otherwise, just do a simple link. The 5th argument $force,
# will erase a target directory if already present, otherwise an error
# is signalled. The final 6th argument is the CompressionLevel which
# can be left out and will the be calculated from
# bpc->Conf{CompressLevel}

# Note the recursion is set up so that the return is negative if
# error, positive if consistent with a valid pool link (i.e. zero
# length file or directories are consistent too), and zero if
# successful but not consistent with a valid pool.  The overall result
# is -1 if there is any error and otherwise the AND'ed result of the
# operations. That means if the overall result is positive then the
# whole tree is successfully linked to the pool, so the next time
# around we can use a simple linking (i.e $linkcheck=0).  Note $source
# and $target must be full paths from root.  Note: if files are not
# compressed properly then you won't be able to link them to pool.
sub link_recursively_topool
{
    my ($bpc, $source, $target, $linkcheck, $force, $compress) = @_;
        my $ret=1;
        die "Error: '$source' doesn't exist" unless -e $source;
        if (-e $target) {
                die "Error can't overwrite: $target (unless 'force' set)\n" 
unless $force;
                die "Error can't remove: $target ($!)\n"  unless 
rmtree($target, 0, 0);
        }
        if (-d $source) {
                die "Error: mkdir failed to create new directory: $target 
($!)\n"
                        unless jmkdir($target);
                opendir( my $dh, $source) || die "Error: Could not open dir 
$source ($!)\n";
                foreach my $elem (readdir($dh)) {
            next if /^\.\.?$/;     # skip dot files (. and ..)
                        my $newsource = "$source/$elem";
                        my $newtarget = "$target/$elem";
            my $newret = link_recursively_topool($bpc, $newsource, $newtarget, 
$linkcheck, $force, $compress);
                        if ($newret < 0) { #Error so break out of loop & return
                                closedir $dh;
                                return -1 
                        }
                        $ret = 0 if $newret == 0; #note $ret stays at 1 only if 
no elements return -1 or 0
        }
                closedir $dh;
                return $ret;
        }
        elsif ($dryrun) {return 1} # Return before making changes to filesystem
        elsif ( ! -s $source) { # zero size
                copy($source, $target); #No link since zero size
        }
        elsif ($linkcheck) { #Makes sure first that source properly linked to 
pool
                return(zCopyFileLink($bpc, $source, $target, 1, $compress));
        }
        else {#Otherwise, OK to perform simple link to source
                return (link($source, $target) == 1 ? 1 : -1)
        }
}

sub get_bakinfo
{
        my ($bakinfofile, $entry) = @_;
        our %backupInfo = ();

        $bakinfofile .= "/backupInfo";
        warn "Can't read $bakinfofile\n" unless -f $bakinfofile;

        unless (my $ret = do $bakinfofile) { # Load  the backupInfo file
                if ($@) {
                        warn "couldn't parse $bakinfofile: $@\n";
                }
                elsif (!defined $ret) {
                        warn "couldn't do $bakinfofile: $!\n";
                }
                elsif (! $ret) {
                        warn "couldn't run $bakinfofile\n";
                }
        }
        my $value = $backupInfo{$entry};
        warn "$bakinfofile is empty or missing entry for '$entry'\n"  
                unless defined $value;
        return $value;
}

# Note: getattr($attr) =$attr->{files} 
#       getattr($attr, $file) =$attr->{files}{$file}
#       getattr($attr, $file, $attribute)  =$attr->{files}{$file}{$attribute}

sub getattr
{
    my($attr, $fileName, $Attribute) = @_;
    return $attr->{files}{$fileName}{$Attribute} if ( defined($Attribute) );
    return $attr->{files}{$fileName} if ( defined($fileName) );
    return $attr->{files};
}


#Reads in the attrib file for directory $_[1] and (optional alternative 
#attrib file name $_[2]) and #stores it in the hashref $_[0] passed to 
#the function
#Returns -1 and a blank $_[0] hash ref if attrib file doesn't exist 
#already (not necessarily an error)
#Dies if attrib file exists but can't be read in.
sub read_attrib
{ 
#Note: $_[0] = hash reference to attrib object
#SO DO NOT USE LOCAL VARIABLE FOR IT (i.e. don't do my $attr=$_[0]
        $_[0] = BackupPC::Attrib->new({ compress => $Conf{CompressLevel} });

#       unless (defined $_[1]) { #JJK: DEBUGGING
#               print "read_attrib: \$_[1] undefined\n";
#               print Dumper @_;
#       }
        return -1 unless -f attrib($_[1], $_[2]);
    #This is not necessarily an error because dir may be empty

        $_[0]->read($_[1],$_[2]) or
                die "Error: Cannot read attrib file: " . attrib($_[1],$_[2]) . 
"\n";

        return 1;
}

#Same as Lib.pm fileNameUnmangle but doesn't require
#unneccessary '$bpc'
sub jfileNameUnmangle {
    my($name) = @_;

    $name =~ s{/f}{/}g;
    $name =~ s{^f}{};
    $name =~ s{%(..)}{chr(hex($1))}eg;
    return $name;
}
        
sub count_file_attribs
{
        my ($attrib) = @_;
        return( scalar (keys (%{$attrib->get()})));
}

# Get attrib entry for $fullfilname. The corresponding hash is both returned and
# also fills the hash reference (optionally) passed via $fileattrib.
# If attrib file not present, return -1 (which may not be an error)
# Returns -2 if not a mangled file
# Dies if error
sub get_file_attrib
{
        my ($fullfilename, $fileattrib) = @_;
        $fullfilename =~ m{(.+)/(.+)};  #1=dir; $2=file
        return -2 unless defined $2;

        return -1 if read_attrib(my $attr, $1) < 0;

        %{$fileattrib} =  %{$attr->{files}{jfileNameUnmangle($2)}};
        #Note unmangling removes initial 'f' AND undoes char mapping
}

# Returns value of attrib $key for $fullfilename (full path)
# If not a mangled file or attrib file not present or there is not an
# entry for the specificed key for the given file, then return 'undef'
sub get_attrib_value
{
        my ($fullfilename, $key) = @_;
        $fullfilename =~ m{(.+)/(.+)};  #1=dir; $2=file

        return undef unless defined $2;
        return undef if read_attrib(my $attr, $1) < 0;
        return $attr->{files}{jfileNameUnmangle($2)}{$key}; 
    #Note this returns undefined if key not present
        #Note unmangling removes initial 'f' AND undoes char mapping
}

# Returns value of attrib type key for $fullfilename (full path)
# If attrib file present but filename not an entry, return -1 [not an error if 
file nonexistent]
# If no attrib file (but directory exists), return -2 [not an error if 
directory empty]
# If directory non-existent, return -3
# If attrib file present but not readble, return -4 [true error]
# Note there may an entry even if file non-existent (e.g. type 10 = delete)
sub get_attrib_type
{
        my ($fullfilename) = @_;
        $fullfilename =~ m{(.+)/(.+)};  #1=dir; $2=file

#       unless (defined $1) { #JJK: DEBUGGING
#               print "get_attrib_type: \$1 undefined\n";
#               print Dumper @_;
#       }

        return -3 unless -d $1;
        return -2 unless -f attrib($1);
        return -4 unless read_attrib(my $attr, $1) >= 0;
        my $type = $attr->{files}{jfileNameUnmangle($2)}{type};
        #Note unmangling removes initial 'f' AND undoes char mapping
        return (defined($type) ? $type : -1);
}

# 4th argument $poollink says whether to write to file (0) or link to
# pool (using MakeFileLink).
# 5th argument tells what to do if no files in $attrib 
# (0=return error, 1=delete attrib file and return success)
# 6th argument is an optional alternative name for the attrib file itself
# Note does an unlink first since if there are hard links, we don't want
# to modify them
# Returns positive if successful, 0 if not
# Specifically, 1 if linked to existing, 2 if linked to new, 
# 3 if written without linking, 4 if (empty) attrib file deleted

sub write_attrib
{
        my ($bpc, $attrib, $dir, $poollink, $delempty, $attfilename) = @_; 
        die "Error: Cannot write to directory: $dir\n" unless -w $dir;

#       unless (defined $dir) { #JJK: DEBUGGING
#               print "write_attrib: \$dir undefined";
#               print Dumper @_;
#       }

        my $attfilepath = attrib($dir, $attfilename);
        return 1 if $dryrun; #Return before writing changes
        die "Error: could not unlink old attrib file: $attfilepath\n"  
                if (-e $attfilepath && ! unlink($attfilepath)); #Delete old 
attrib file if exists cuz may be hard-linked
        return 4 if(count_file_attribs($attrib) == 0 && $delempty); #No attribs 
left so leave it unlinked
    die "Error: could not write to attrib file: $attfilepath\n"
                unless ($attrib->write($dir, $attfilename)) == 1; #First write 
a copy without linking
        my $ret=3;
        if ($poollink) {
                my $data = $attrib->writeData;
                my $md5 = Digest::MD5->new;
                my $digest;
                if(($digest = $bpc->Buffer2MD5($md5, length($data), \$data)) ne 
-1 
                   && ($ret = $bpc->MakeFileLink($attfilepath, $digest, 1, 
$Conf{CompressLevel})) <= 0) {
                        printerr "Can't link attrib file to pool: $attfilepath 
($ret)\n";
                }
        }
        return $ret;
}

# Write out $fileattrib for $file (without the mangle) to $dir/$attfilename (or
# to the default attribute file for $dir if $attfilename is undef)
# Reads in existing attrib file if pre-existing
# 4th argument $poollink says whether to write to file (0) or link to
# pool (using MakeFileLink).
# Returns positive if successful, 0 if not
# Specifically, 1 if linked to existing, 2 if linked to new,
# 3 if written without linking
sub write_file_attrib
{
    my ($bpc, $dir, $file, $fileattrib, $poollink, $attfilename) = @_; #Note 
$fileattrib is a hashref
        my $ret=0;

        read_attrib(my $attr, $dir, $attfilename); #Read in existing attrib 
file if it exists
        $ret = write_attrib($bpc, $attr, $dir, $poollink, 0, $attfilename) 
                if $attr->set($file, $fileattrib) > 0;

#       unless (defined $dir) { #JJK: DEBUGGING
#               print "write_file_attrib: \$dir undefined\n";
#               print Dumper @_;
#       }

        die "Error writing to '$file' entry to attrib file: " . attrib($dir, 
$attfilename) . "\n" unless $ret > 0;
        return $ret;
}

sub attrib
{
        return (defined($_[1]) ? "$_[0]/$_[1]" : "$_[0]/attrib");
}

# Modified version of MakeFileLink including:
# 1. Efficiency/clarity improvements
# 2. Calls GetPoolLink to find candidate link targets.
# 2. For non-compressed files, uses my improved jcompare comparison algorithm
# 3. For compressed files, uses zcompare2 which compares only the compressed
#    data sans first-byte header & potential rsync digest trailer. This allows
#    for matches even if one file has rsync digest and other does not
# 4. Moves file before unlinking in case subsequent link fails and needs to be 
#    undone
# 5. Added 6th input parameter to return pointer to the pool link name
# 6. Extended meaning of newFile flag
#      0 = Don't creat new pool file (as before)
#      1 = Create new pool file IF no other links to source file
#          (this was the previous behavior for whenever newFile was set)
#      >2 = Create new pool file EVEN if source file has more than one link
#          (this will by extension link other things linked to the source
#           also to the pool -- which means that the pool might not clean
#           if it links to things outside of the pc directory -- so 
#           use carefully
#  7. Includes 'j' versions of file routines to allow dryrun
#  8. Added check to see if already in pool and if so returns 3

# For each file, check if the file exists in $bpc->{TopDir}/pool.
# If so, remove the file and make a hardlink to the file in
# the pool.  Otherwise, if the newFile flag is set, make a
# hardlink in the pool to the new file.
#
# Returns 0 if a link should be made to a new file (ie: when the file
#    is a new file but the newFile flag is 0).
#    JJK: actually also if $name has nlinks >1 regardless of newFile flag
# Returns 1 if a link to an existing file is made,
# Returns 2 if a link to a new file is made (only if $newFile is set)
# Return 3 if first finds that already linked to ipool
# Returns negative on error.

sub jMakeFileLink
{
        my($bpc, $name, $d, $newFile, $compress, $linkptr) = @_;

        my $poollink;
        my $result=GetPoolLinkMD5($bpc, $name, $d, $compress, 1, \$poollink);
        $$linkptr = $poollink if defined($linkptr) && $result > 0;

        if($result == 1){ #Already linked to the pool
                return 3;
        }elsif($result == 2) { #Matches existing, linkable pool file
                my $tempname = mktemp("$name.XXXXXXXXXXXXXXXX");
                return -5 unless jrename($name, $tempname); #Temorarily save
                if(!jlink($poollink, $name)) { #Link pool to source
                        jrename($tempname, $name); #Restore if can't link
                        return -3;
                }
                junlink($tempname); #Safe to remove the original
                return 1;
        }elsif($result == 3) {
                if(defined($newFile) && #No link or match 
                   ($newFile > 1 || ($newFile == 1 && (stat($name))[3] == 1 ))) 
{
                        $poollink =~ m|(.*)/|;
                        jmkpath($1, 0, 0777) unless -d $1 ;
                        return -4 unless jlink($name, $poollink);
                        return 2;
                } else { #New link should have been made but told not to
                        return 0;
                }
        }else {
                return -6; #Error from GetPoolLink call
        }
}

# GetPoolLink
# GetPoolLinkMD5
#Find the pool/cpool file corresponding to file $name.
#1. First iterates entire chain to see if *same inode* is present. I.e. if 
#   already linked to the pool. If so, it returns the first instance. 
#   Return = 1 and $Poolpathptr = name of the hard-linked match
#2. If $compareflg is set, then iterate through again this time looking for
#   file *content* matches (this is much slower). 
#   If so, it returns the first instance with Nlinks < HardLinkMax
#   Return = 2 and $Poolpathptr = name of the content match
#3. Finally, if not linked (and also not matched if $compareflg set)
#   Return=3 and $$poolpathptr = first empty chain
#Note: Return zero if zero size file
#      Return negative if error.
#Note: if chain has multiple copies of the file, then it returns the first 
linked
#match if present and if none and $compareflag set then the first content match
sub GetPoolLink
{
        my($bpc, $md5, $name, $compress, $compareflg, $poolpathptr) = @_;

        $compress = $bpc->{Conf}{CompressLevel} unless defined $compress;

        my ($md5sum , $ret) = defined($compress) && $compress > 0 ?
                zFile2MD5($bpc, $md5, $name, 0, $compress) : 
                $bpc->File2MD5($md5, $name);

        return 0 if $ret == 0; #Zero-sized file
        return -3 unless $ret >0;

        GetPoolLinkMD5($bpc, $name, $md5sum, $compress, $compareflg, 
$poolpathptr);
}

sub GetPoolLinkMD5
{
        my($bpc, $name, $md5sum, $compress, $compareflg, $poolpathptr) = @_;
        my($poolbase, $i);

        return -1 unless -f $name;
        my $inode = (stat(_))[1];  #Uses stat from -f
        return 0 if (stat(_))[7] == 0; #Zero-size (though shouldn't really 
happen since
                                   #md5sum input not defined for zero sized 
files

        $compress = $bpc->{Conf}{CompressLevel} unless defined $compress;

        return -2 unless 
                defined($poolbase = $bpc->MD52Path($md5sum, $compress));

        #1st iteration looking for matching inode
        $$poolpathptr = $poolbase;
        for($i=0; -f $$poolpathptr; $i++) { #Look for linked copy (inode match)
                return 1 if ((stat(_))[1] == $inode);
                $$poolpathptr = $poolbase . '_' . $i; #Iterate
        }

        return 3 unless $compareflg; #No inode match

        #Optional 2nd iteration looking for matching content
        my $compare = defined($compress) && $compress > 0 ? \&zcompare2 : 
\&jcompare;
        $$poolpathptr = $poolbase;
        for(my $j=0; $j<$i; $j++ ) { #Look for content match
                return 2 if (stat($$poolpathptr))[3] < 
$bpc->{Conf}{HardLinkMax} &&
                        !$compare->($name, $$poolpathptr);
                $$poolpathptr = $poolbase . '_' . $j; #Iterate
        }
        # No matching existing pool entry - $$poolpathptr is first empty chain 
element
        return 3;
}

#Convert pool name to constant length string consisting
#of 32 hex digits for the base plus 6 (left) zero padded digits for
#the chain suffix (suffixes are incremented by 1 so that no suffix 
#records as 0). Note this accomodates chains up to 10^6 long.
#Use a number bigger than 6 if you have longer chains
#Useful if you want to order (e.g., sort) pool file names numerically
sub poolname2number
{
        $_[0] =~ m|(.*/)?([^_]*)(_(.*))?|;
        my $suffix = defined($4) ? $4+1 : 0;
        return sprintf("%s%06s", $2, $suffix)
}

# Renumber pool chain holes starting at $firsthole and continue up the chain 
# to fill up # to $maxgap holes (which need not be contiguous).
# If $maxgap is not specified it defaults to 1 (sufficient to cover one 
# deletion - i.e. hole -- which may be $file itself)
# If $firsthole exists, it is an error. Use delete_pool_file instead,
# if you want to delete first before renumbering.
# Return 1 on success; Negative on failure
sub renumber_pool_chain
{
        my ($firsthole, $maxholes) = @_;
        
        $maxholes = 1 unless defined $maxholes;
        
        my ($base, $sufx);
        if($firsthole =~ m|(.*_)([0-9]+)|) {
                $base = $1;
                $sufx = $2;
        }else{
                $base = $firsthole . "_";
                $sufx = -1; #Corresponds to no suffix
        }

        my $nextsufx = $sufx+1;
        my $nextfile = $base .  $nextsufx;
        while($nextsufx - $sufx <= $maxholes) {
                if(-e $nextfile) { #Non-hole so move/renumber
                        if(-e $firsthole || ! jrename($nextfile, $firsthole)) {
                                warn "Error: Couldn't rename pool file: 
$nextfile --> $firsthole\n";
                                return -2;
                        }
#                       print "Renumbering: $nextfile --> $firsthole\n";
                        $firsthole = $base . (++$sufx); #Move hole up the chain
                }
                $nextfile = $base . (++$nextsufx);
        }
        return 1;
}

# Delete pool file (if it exists) and regardless renumber 
# chain to fill hole left by file. Fill up to $maxholes
# above $file (including the hole left by $file) where
# $maxholes defaults to 1.
# If $file doesn't exist, then it just fills the hole
# left by $file.
# Return 1 on success; Negative on failure
sub delete_pool_file
{
        my ($file, $maxholes) = @_;

        if(-e $file && !junlink($file)) { #Try to delete if exists
                warn "Error: Couldn't unlink pool file: $file\n";
                return -3;
        }
        return(renumber_pool_chain($file,$maxholes)); #Returns -1/-2 on fail
}

# Run BackupPC_nightly
sub run_nightly
{
        my ($bpc) = @_;
    my $err = $bpc->ServerConnect($Conf{ServerHost}, $Conf{ServerPort});
    if ($err) {
        printerr "BackupPC_nightly: can't connect to server ($err)...\n";
        return($err);
    }
    if ((my $reply = $bpc->ServerMesg("BackupPC_nightly run")) eq "ok\n" ) {
        $bpc->ServerMesg("log $0: called for BackupPC_nightly run...");
        print "BackupPC_nightly scheduled to run...\n";
        return 0;
    }
    else {
        printerr "BackupPC_nightly ($reply)...\n";
        return $reply;
    }
}


# Rewrite of compare function (in File::Compare) since it messes up on
# weird filenames with spaces and things (since it combines the "<"
# and the filename in a single string). Also streamline the code by
# getting rid of of extra fluff and removing code for comparing text
# files while we are at. The basic algorithm remains the classic one
# for comparing 2 raw files.
# Returns 0 if equal, 1 if not equal. And negative if error.
sub jcompare {
    my ($file1,$file2,$size) = @_;
        my ($fh1open, $fh2open, $fh1size);
        my $ret=0;

    local (*FH1, *FH2);
        unless (($fh1open = open(FH1, "<", $file1)) &&
                        ($fh2open = open(FH2, "<", $file2))) {
                $ret = -1;
                goto compare_return;
        }
        binmode FH1;
        binmode FH2;
        if (($fh1size = -s FH1) != (-s FH2)) {
                $ret=1;
                goto compare_return;
        }

        unless (defined($size) && $size > 0) {
            $size = $fh1size;
            $size = LINUX_BLOCK_SIZE if $size < LINUX_BLOCK_SIZE;
            $size = TOO_BIG if $size > TOO_BIG;
        }

        my $data1 = my $data2 = '';
        my ($r1,$r2);
        while(defined($r1 = read(FH1,$data1,$size)) && $r1 > 0) {
            unless (defined($r2 = read(FH2,$data2,$r1)) && $r2 == $r1
                                && $data2 eq $data1) {
                        $ret=1;
                        goto compare_return;
            }
        }
        $ret=1  if defined($r2 = read(FH2,$data2,LINUX_BLOCK_SIZE)) && $r2 > 0;
        $ret =-2 if $r1 < 0 || $r2 < 0; # Error on read

  compare_return:
        close(FH1) if $fh1open;
        close(FH2) if $fh2open;
        return $ret;
}

# Version of compare for BackupPC compressed files. Compares inflated
# (uncompressed) versions of files. The BackupPC::FileZIO subroutine
# is used to read in the files instead of just raw open & read.  Note
# this version of compare is relatively slow since you must use zlib
# to decompress the streams that are being compared. Also, there is no
# shortcut to just first compare the filesize since the size is not
# known until you finish reading the file.
sub zcompare {
        my ($file1, $file2, $compress)=@_;
        my ($fh1, $fh2);
        my $ret=0;

        $compress =1 unless defined $compress;
        unless ((defined ($fh1 = BackupPC::FileZIO->open($file1, 0, 
$compress))) &&
                        (defined ($fh2 = BackupPC::FileZIO->open($file2, 0, 
$compress)))) {
                $ret = -1;
                goto zcompare_return;
        }
        my $data1 = my $data2 = '';
        my ($r1, $r2);
        while ( ($r1 = $fh1->read(\$data1, 65536)) > 0 ) {
                unless ((($r2 = $fh2->read(\$data2, $r1)) == $r1)
                                && $data1 eq $data2) {
                        $ret=1;
                        goto zcompare_return;
                }
        }
        $ret =1 if ($r2 = $fh2->read(\$data2, LINUX_BLOCK_SIZE)) > 0; #see if 
anything left...
        $ret =-1 if $r1 < 0 || $r2 < 0; # Error on read

  zcompare_return:
        $fh1->close() if defined $fh1;
        $fh2->close() if defined $fh2;
        return $ret;
}

# Second alternative version that combines the raw speed of jcompare
# with the ability to compare compressed files in zcompare.  This
# routine effectively should compare two compressed files just as fast
# as the raw jcompare. The compare algorithm strips off the first byte
# header and the appended rsync checksum digest info and then does a
# raw compare on the intervening raw compressed data. Since no zlib
# inflation (decompression) is done it should be faster than the
# zcompare algorithm which requires inflation. Also since zlib
# compression is well-defined and lossless for any given compression
# level and block size, the inflated versions are identical if and
# only if the deflated (compressed) versions are identical, once the
# header (first byte) and trailers (rsync digest) are stripped
# off. Note that only the data between the header and trailers have
# this uniqe mapping. Indeed, since BackupPC only adds the checksum
# the second time a file is needed, it is possible that the compressed
# value will change with time (even though the underlying data is
# unchanged) due to differences in the envelope. Note, to some extent
# this approach assumes that the appended digest info is correct (at
# least to the extent of properly indicating the block count and hence
# compressed data size that will be compared)
sub zcompare2 {
    my ($file1,$file2,$size) = @_;
        my ($fh1, $fh2, $fh1open, $fh2open);
        my $Too_Big = 1024 * 1024 * 2;
        my $ret=0;
        
        unless (($fh1open = open($fh1, "<", $file1)) &&
                        ($fh2open = open($fh2, "<", $file2))) {
                $ret = -1;
                goto zcompare2_return;
        }
        binmode $fh1;
        binmode $fh2;
        
        my $fh1size = -s $fh1;
        my $fh2size = -s $fh2;

        my $data1 = my $data2 = '';
        unless (read($fh1, $data1, 1) == 1 &&  # Read first byte
                        read($fh2, $data2, 1) == 1) {
                $ret = -1;
                goto zcompare2_return;
        }
        if (vec($data1, 0, 8) == 0xd6 || vec($data1, 0, 8) == 0xd7) {
                return -2 unless ( defined(seek($fh1, -8, 2)) ); 
                return -3 unless ( read($fh1, $data1, 4) == 4 );
                $fh1size -= 20 * unpack("V", $data1) + 49;
                # [0xb3 separator byte] + 20 * NUM_BLOCKS + DIGEST_FILE(32) + 
DIGEST_INFO (16)
                # where each of NUM_BLOCKS is a 20 byte Block digest consisting 
of 4 bytes of
                # Adler32 and the full 16 byte (128bit) MD4 file digest 
(checksum).
                # DIGEST_FILE is 2 copies of the full 16 byte (128bit) MD4 
digest 
                # (one for protocol <=26 and the second for protocol >=27)
                # DIGEST_INFO is metadata consisting of a pack("VVVV") encoding 
of the
                # block size (should be 2048), the checksum seed, length of the
                # block digest (NUM_BLOCKS), and the magic number (0x5fe3c289).
                # Each is a 4 byte integer.
        }

        if (vec($data2, 0, 8) == 0xd6 || vec($data2, 0, 8) == 0xd7) {
                return -2 unless ( defined(seek($fh2, -8, 2)) ); 
                return -3 unless ( read($fh2, $data2, 4) == 4 );
                $fh2size -= 20 * unpack("V", $data2) + 49;
        }

        if ($fh1size != $fh2size) {
                $ret=1;
                goto zcompare2_return;
        }

        seek($fh1,1,0) ; #skip first byte
        seek($fh2,1,0) ; #skip first byte


        my $bytesleft=--$fh1size; #Reduce by one since skipping first byte
        $size = $fh1size unless defined($size) && $size > 0 && $size < $fh1size;
        $size = $Too_Big if $size > $Too_Big;

        my ($r1,$r2);
        while(defined($r1 = read($fh1,$data1,$size))
                  && $r1 > 0) {
            unless (defined($r2 = read($fh2,$data2,$r1))
                                && $data2 eq $data1) { #No need to test if 
$r1==$r2 since should be same size
                        $ret=1;        #Plus if it doesn't, it will be picked 
up in the $data2 eq $data1 line
                        goto zcompare2_return;
            }
                $bytesleft -=$r1;
                $size = $bytesleft if $bytesleft < $size;
        } 
        $ret=1  if defined($r2 = read($fh2,$data2,$size)) && $r2 > 0;
    #Note: The above line should rarely be executed since both files same size
        $ret =-2 if $r1 < 0 || $r2 < 0; # Error on read

  zcompare2_return:
        close($fh1) if $fh1open;
        close($fh2) if $fh2open;
        return $ret;
}

#Routine to remove files (unlink) or directories (rmtree)
# $fullfilename is full path name to file or directory
# Returns number of files deleted in the listref $dirfileref if defined;
sub delete_files
{
        my ($fullfilename, $dirfileref) = @_;
        my $listdeleted = defined($dirfileref);
        my $ret = -1;
        my $output;
        die "Error: '$fullfilename' doesn't exist or not writeable\n" 
                unless  -w $fullfilename;
        if(-f $fullfilename) {
                $ret = junlink($fullfilename);
        }
        elsif(-d $fullfilename) {
                $$dirfileref = "" if $listdeleted;
                open(local *STDOUT, '>', $dirfileref) #redirect standard out to 
capture output of rmtree
                        if $listdeleted;
                $ret = jrmtree($fullfilename, $listdeleted, 1);
        }
        return $ret;
}

#Touch command (modified version of Unix notion)
#First argument is the (full)filepath Second argument if defined means
#file is only created if not existent but access and modification
#times not changed if already existent.
#Returns 1 on success, -1 on error;
sub touch
{
        unless (-e $_[0]) { #Create if non-existent
                return -1 unless defined(sysopen(my $fh, $_[0], O_CREAT));
                close $fh;
        }
        my $time = time();
        utime($time, $time, $_[0]) unless defined $_[1];
        return 1;
}

#Simple wrappers to protect when just doing dry runs
sub jcopy
{
        return 1 if $dryrun;
        copy @_;
}

sub jlink
{
        return 1 if $dryrun;
        link $_[0], $_[1];
}

sub junlink
{
        return 1 if $dryrun;
        unlink @_;
}

sub jmkdir
{
        return 1 if $dryrun;
        mkdir @_;
}

sub jmkpath
{
    return 1 if $dryrun;
    mkpath @_;
}

sub jmake_path
{
    return 1 if $dryrun;
    mkpath @_;
}

sub jrename
{
        return 1 if $dryrun;
        rename $_[0], $_[1];
}

sub jrmtree
{
        return 1 if $dryrun;
        rmtree @_;
}

1;

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>
  • [BackupPC-users] New faster/more efficient approach to backing up BackupPC, Jeffrey J. Kosowsky <=