[BackupPC-users] Updated version of BackupPc

Here is the latest (updated) version of BackupPC_copyPcPool along with
the jLib.pm library which it uses.

It is much, much better than my initial version(s) released in 2011 as
pointed out in my email from yesterday. However, it does not yet
include the "incremental" backup/transfer option that I referred to
yesterday. Also, it currently requires ~190 bytes/pool entry (for
o(million) pool elements) to cache the pool inodes. If the -P (pack)
flage, this is reduced to about ~140 bytes/pool entry. I intend to implement
a tighter packing of about 30 bytes/pool entry at the expense of
lookup speed in a future version.

If anyone has trouble with line wrapping errors caused by my emailing
the plain text, I can send it as an attached file..

I am very interested in hearing people's feedback and experiences with
this program...

------------------------------------------------------------------------
BackupPC_copyPcPool.pl

#!/usr/bin/perl
#============================================================= -*-perl-*-
#
# BackupPC_copyPcPool.pl: Reliably & efficiently copy pool and pc tree
#
# DESCRIPTION
#   See below for detailed description of what it does and how it works
#   
# AUTHOR
#   Jeff Kosowsky
#
# COPYRIGHT
#   Copyright (C) 2011-2013  Jeff Kosowsky
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
#========================================================================
#
# Version 0.2.1, February 2013
#
# CHANGELOG:
# 0.1.1 (Feb 2011)
# 0.1.2 (May 2011)
# 0.1.3 (Sep 2011)
# 0.1.4 (Jan 2013) Added options & error checking to set backuppc uid & guid
#                  when not available from passwd & group files, respectively
# 0.2.0 (Feb 2013) Initial release of new version using full in-memory I-node
#                  (hash) caching rather than I-node pooling via file system
#                  tree
# 0.2.1 (Feb 2013) Added packed option for I-node cache hash index & values
#                  Added md5sum output option
#========================================================================
#
# DESCRIPTION
#
# 1. The program first recurses through the pool/cpool to create a
# perl hash cache (%InodeCache) indexed by inode of all the pool
# entries. This subsequently allow for rapid lookup and association of
# each linked pc file with the corresponding stored pool
# file. Optionally, the cache can be written out to a file as a list
# of "<inode> <pool mdsum>" pairs. This inode chache file can then be
# read in rapidly to populate the chache on subsequent runs, assuming
# the pool hasn't changed in the interim.
#
# 2. The program then runs through the pc tree (or a designated
# subtree thereof) to create a single long list of all of the
# directories, hard links, and zero-length files
# For hard links, the entry is:
#     <path-to-pool-file> <path-to-pc-file>
# For directories, the entry is:
#     D <owner> <group> <mode> <mtime> <path-to-directory>
# For zero-length files, the entry is:
#     Z <owner> <group> <mode> <mtime> <path-to-file>
# For 'errors' (e.g., non-linked, non-zero length files), the entry is:
#     X <owner> <group> <mode> <mtime> <path-to-file>
# Comments can be indicated by:
#     #

# Note: all paths are relative to TopDir

# Any (non-zero length) pc files that are not linked (i.e. only one
# hard link) or that are not found in the InodeCache may optionally be
# corrected and linked to the pool and if this results in a new pool
# entry then that entry is added to the InoceCache too. Files that are
# not corrected are listed for separate examination and manual
# correction or backup.

# The output consists of 3 files:

# <outfile>.links   is the file consisting of the list of links,
#                   directories and zero-length files described above

# <outfile>.nopool  is a listing of the normal top-level log and info
#                   files that are not linked and need to be copied
#                   over directly

# <outfile>.nolinks is a listing of the non-linked, non-zero length
#                   files that either couldn't be linked or that you
#                   decided not to link
#

# If the --Writecache|-W option is set, a 4th output file is generated:

# <outfile>.icache  is a 2 column listing consisting of
#                   <inode number> <pool file name> pairs. 
#                    If the --writecache|-W option is repeated, then a
#                    3rd column is added consisting of the md5sum of
#                    the (uncompressed) contents of the pool file.
#                    One can then check for pool dups by running:
#                        sort -k 3 <outfile>.icache  | uniq -f 2 -D

# NOTE: Backuppc on the source machine should be disabled (or a
# static snapshot used) during the entire backup process

# 3. Restoring then consists of the following steps:

#    A. Copy over the pool/cpool any way you like, including
#       rsync but without having to worry about hard links

#    B. Copy over the non-pooled files (<outfile>.nopool) any way you want,
#       e.g., rsync, tar, etc.

#    C. Run this program again using the -r|--restore flag and the
#       <outfile>.nolinks as the input

#    D. Optionally copy over the non-linked files in <outfile>.nolinks

# NOTE: Backuppc on the target machine should be disabled (or a
# snapshot used) during the entire restore process

# Selected features:
#   --gzip|-g      flag creates compressed output/input
#   --stdio        writes/reads to stdio allowing you to pipe the backup
#                  stage directly into the restore stage
#   --fixlinks|-f  fixes any missing/broken pc links
#   --topdir|-t    allows setting of alternative topdir
#   --dryrun|-d    doesn't make any changes to pool or pc tree
#   --verbose|-v   verbosity (repeat for higher verbosity)
#
#========================================================================

use strict;
use warnings;
use File::Path;
use File::Glob ':glob';
use Getopt::Long qw(:config no_ignore_case bundling);
use Fcntl;  #Required for RW I/O masks
use Switch;

use lib "/usr/share/BackupPC/lib";
use BackupPC::FileZIO;
use BackupPC::Lib;
use BackupPC::jLib v0.4.2;  # Requires version >= 0.4.2
use BackupPC::Attrib qw(:all);

no  utf8;

#use Data::Dumper; #JJK-DEBUG
#use Devel::Size qw(size total_size); #JJK-DEBUG

select(STDERR); #Auto-flush (i.e., unbuffer) STDERR
$| = 1;
select(STDOUT);

#Variables
my $bpc = BackupPC::Lib->new or die "BackupPC::Lib->new failed\n";      
my $md5 = Digest::MD5->new;
my $attr; #Re-bless for each backup since may have different compress level
my $MaxLinks = $bpc->{Conf}{HardLinkMax};
my $CREATMASK = O_WRONLY|O_CREAT|O_TRUNC;
my %InodeCache=(); #Cache for saving pool/cpool inodes
my @nonpooled = ();
my @backups = ();
my @compresslvls=(); #Pool to use for each backup;
my ($compresslvl, $ptype);

my $backuppcuser = $bpc->{Conf}{BackupPCUser}; #User name
my $backuppcgroup = $bpc->{Conf}{BackupPCUser}; #Assumed to be same as user

my $directories = 0;
my $totfiles = 0; #Total of next 6 variables
my $zerolengthfiles = 0;
my $existinglink_pcfiles = 0;
my $fixedlink_pcfiles = 0;
my $unlinkable_pcfiles = 0;
my $unlinked_pcfiles = 0;
my $unlinked_nonpcfiles = 0;
my $unlinked_files = 0; #Total of above three
my $missing_attribs = 0;

#GetOptions defaults:
my $cleanpool;
$dryrun=0;  #Global variable defined in jLib.pm (do not use 'my') #JJK-DEBUG
#$dryrun=1; #Set to 1 to make dry run only
my $fixlinks=0;
my $Force;
my $gzip;
my $giddef = my $giddef_def = getgrnam($backuppcuser); #GID
$giddef_def = "<unavailable>" unless defined $giddef_def;
my $outfile;
my $Overwrite;
my $Pack;
my $pool = my $pool_def = 'both';
my $restore;
my $Readcache;
my $Skip;
my $stdio;
my $TopDir = my $TopDir_def = $bpc->{TopDir};
my $uiddef = my $uiddef_def = getpwnam($backuppcgroup); #UID
$uiddef_def = "<unavailable>" unless defined $uiddef_def;
my $verbose = my $verbose_def = 2;
my $noverbose;
my $Writecache;

usage() unless( 
        GetOptions( 
                "cleanpool|c"   => \$cleanpool, #Remove orphan pool entries
                "dryrun|d!"     => \$dryrun,    #1=dry run
                "fixlinks|f"    => \$fixlinks,  #Fix unlinked/broken pc files
                "Force|F"       => \$Force,     #Override stuff...
                "gid|G=i"       => \$giddef,    #default gid if not in group 
file
                "gzip|g"        => \$gzip,      #Compress files
                "outfile|o=s"   => \$outfile,   #Output file (required)
                "Overwrite|O"   => \$Overwrite, #OVERWRITES existing files/dirs 
(restore)
                "pool|p=s"      => \$pool,      #Pool (pool|cpool|both)
                "Pack|P"        => \$Pack,      #Pack the InodeCache hash
                "restore|r"     => \$restore,   #Restore rather than backup
                "Readcache|R=s" => \$Readcache, #Read InodeCache from file
                "Skip|S"        => \$Skip,      #Skip existing backups (restore)
                "stdio|s"       => \$stdio,     #Print/read to/from stdout/stdin
                "topdir|t=s"    => \$TopDir,    #Location of TopDir
                "uid|U=i"       => \$uiddef,    #default gid if not in group 
file
                "verbose|v+"    => \$verbose,   #Verbosity (repeats allowed)
                "noverbose"     => \$noverbose, #Shut off all verbosity
                "Writecache|W+" => \$Writecache,#Write InodeCache out to file
                "help|h"        => \&usage,
        ));

if($restore) {
        usage() if $cleanpool || $fixlinks || $outfile ||
                $Readcache || $Writecache ||
                (!$stdio && @ARGV < 1);
} else {
        usage() if ! defined($outfile) ||  ($pool !~ /^(pool|cpool|both)$/) ||
                ($Readcache && $Writecache) || $Overwrite;
}

$verbose = 0 if $noverbose;
my $DRYRUN = $dryrun ? "**DRY-RUN** " : "";

die "Error: 'backuppc' not found in password file, so must enter default UID on 
commandline (--uid|-U option)...\n" unless defined $uiddef;
die "Error: 'backuppc' not found in group file, so must enter default GID on 
commandline (--gid|-G option)...\n" unless defined $giddef;
my (%USERcache, %GROUPcache, %UIDcache, %GIDcache);
$UIDcache{$uiddef} = $backuppcuser;
$GIDcache{$uiddef} = $backuppcgroup;
$USERcache{$backuppcuser} = $uiddef;
$GROUPcache{$backuppcgroup} = $giddef;

############################################################################
if($TopDir ne $TopDir_def) {
        #NOTE: if we are not using the TopDir in the config file, then we
        # need to manually override the settings of BackupPC::Lib->new
        # which *doesn't* allow you to set TopDir (even though it seems so
        # from the function definition, it gets overwritten later when the
        # config file is read)
        $TopDir =~ s|//*|/|g; #Remove any lurking double slashes
        $TopDir =~ s|/*$||g; #Remove trailing slash
        $bpc->{TopDir} = $TopDir;
        $bpc->{Conf}{TopDir} = $TopDir;

        $bpc->{storage}->setPaths({TopDir => $TopDir});
        $bpc->{PoolDir}  = "$bpc->{TopDir}/pool";
        $bpc->{CPoolDir} = "$bpc->{TopDir}/cpool";
}

%Conf   = $bpc->Conf(); #Global variable defined in jLib.pm (do not use 'my')
#############################################################################
#By convention for the rest of this program, we will assume that all
#directory variables have a trailing "/". This should save us some
#efficiency in terms of not having to always add back the slash.
$TopDir .= "/"; $TopDir =~ s|//*|/|g;
my $pcdir = 'pc/';

die "TopDir = '$TopDir' doesn't exist!\n" unless -d $TopDir;
die "TopDir = '$TopDir' doesn't look like a valid TopDir!\n" 
        unless -d "$TopDir/pool" && -d "$TopDir/cpool" && -d "$TopDir/${pcdir}";

system("$bpc->{InstallDir}/bin/BackupPC_serverMesg status jobs >/dev/null 
2>&1");
unless(($? >>8) == 1) {
        die "Dangerous to run when BackupPC is running!!! (use '--Force' to 
override)\n"
                unless $Force ||
                (stat("$TopDir/"))[0] . "." . (stat(_))[1] !=
                (stat("$TopDir_def/"))[0] . "." . (stat(_))[1]; #different fs & 
inode
        warn "WARNING: May be dangerous to run when BackupPC is running!!!\n"; 
    #Warn but don't die if *appear* to be in different TopDir
}

############################################################################
if(defined $restore) {
        do_restore(@ARGV);
        exit; #DONE
}
############################################################################
#Open files for reading/writing 
#NOTE: do before chdir to TopDir
my $sfx = $gzip ? ".gz" : "";
my $linksfile = "${outfile}.links${sfx}";
my $nopoolfile = "${outfile}.nopool${sfx}";
my $nolinksfile = "${outfile}.nolinks${sfx}";

die "ERROR: '$linksfile' already exists!\n" if !$stdio && -e $linksfile;
die "ERROR: '$nopoolfile' already exists!\n" if -e $nopoolfile;
die "ERROR: '$nolinksfile' already exists!\n" if -e $nolinksfile;

my $outpipe = $gzip ? "| gzip > " : "> ";

if($stdio) {
        open(LINKS, $gzip ? "| gzip -f" : ">&STDOUT"); #Redirect LINKS to STDOUT
} else {
        open(LINKS,  $outpipe . $linksfile)
                or die "ERROR: Can't open '$linksfile' for writing!($!)\n";
}

open(NOPOOL, $outpipe . $nopoolfile)
        or die "ERROR: Can't open '$nopoolfile' for writing!($!)\n";
open(NOLINKS, $outpipe . $nolinksfile)
        or die "ERROR: Can't open '$nolinksfile' for writing!($!)\n";

my ($read_Icache, $write_Icache);
if($Pack) { #Pack InodeCache entries
        $read_Icache  = \&read_Icache_pack;
        $write_Icache = \&write_Icache_pack;
} else { #Don't pack InodeCache entries
        $read_Icache  = \&read_Icache_nopack;
        $write_Icache = \&write_Icache_nopack;
}

if($Readcache) { #Read in existing InodeCache from file
        open(RCACHE, $gzip ? "/bin/zcat $Readcache |" : "< $Readcache") or
                die "ERROR: Can't open '$Readcache' for reading!($!)\n";
} elsif($Writecache) {
        my $wcachefile = "${outfile}.icache${sfx}";
        die "ERROR: '$wcachefile' already exists!\n" if -e $wcachefile;
        open(WCACHE, $outpipe . $wcachefile)
                or die "ERROR: Can't open '$wcachefile' for writing!($!)\n";
}

############################################################################
chdir($TopDir); #Do this so we don't need to worry about distinguishing
#between absolute and relative (to TopDir) pathnames
#Everything following opening the files occurs in TopDir
############################################################################
initialize_backups();

warn "**Populating Icache...\n" if $verbose>=1;
my $toticache=0;
if($Readcache) { #Read in existing InodeCache from file
        $ptype = $Conf{CompressLevel} ? "cpool" : "pool";
        while(<RCACHE>) {
                if(/^(\d+)\s+(\S+)/) {
                        &$write_Icache($ptype, $1, $2);
                        $toticache++;
                } elsif(/^#(c?pool)\s*$/) {
                        $ptype = $1;
                } else {
                        warn "'$_' is not a valid cache entry\n" unless /^#/;
                }
        }
        close RCACHE;
} else { # Create InodeCache
        if($pool eq "both") {
                $toticache += populate_icache("pool");
                $toticache += populate_icache("cpool");
        } else {
                $toticache += populate_icache($pool);
        }
        close WCACHE if $Writecache;
}
warn "*Icache entries created=$toticache\n" if $verbose >=2;
if(defined(&total_size)){ #JJK-DEBUG: Print out sizes used by InodeCache
        warn sprintf("InodeCache total size: %d\n",total_size(\%InodeCache));
        warn sprintf("Cpool cache entries: %d\tSize: %d\tTotal_size: %d\n", 
                                 (scalar keys $InodeCache{"pool"}), 
size($InodeCache{"pool"}),
                                 total_size($InodeCache{"pool"})) 
                if defined $InodeCache{"pool"};
        warn sprintf("Cpool cache entries: %d\tSize: %d\tTotal_size: %d\n",
                                 (scalar keys 
$InodeCache{"cpool"}),size($InodeCache{"cpool"}),
                                 total_size($InodeCache{"cpool"})) 
                if defined $InodeCache{"cpool"};
}

exit unless @backups;

warn "**Recording nonpooled top level files...\n" if $verbose>=1;
foreach (@nonpooled) {  #Print out the nonpooled files
        printf NOPOOL "%s\n", $_;
}
close(NOPOOL);

warn "**Recording linked & non-linked pc files...\n" if $verbose>=1;
my $lastmachine = '';
foreach (@backups) {
        m|$pcdir([^/]*)|;
        $lastmachine = $1 if $1 ne $lastmachine;

        warn "* Recursing through backup: $_\n" if $verbose>=1;
        $compresslvl = shift(@compresslvls);
        $attr = BackupPC::Attrib->new({ compress => $compresslvl });
    #Reinitialize this jLib global variable in case new compress level
        $ptype = ($compresslvl > 0 ? "cpool" : "pool");
        m|(.*/)(.*)|;
        find_pool_links($1, $2);
}
close(LINKS);
close(NOLINKS);

##############################################################################
#Print summary & concluding message:
printf STDERR "\nDirectories=%d\tTotal Files=%d\n",
        $directories, ($totfiles + $#nonpooled);
printf STDERR "Link files=%d\t [Zero-length=%d,  Hardlinks=%d (Fixed=%d)]\n",
        ($zerolengthfiles+$existinglink_pcfiles+$fixedlink_pcfiles),
        $zerolengthfiles, ($existinglink_pcfiles+$fixedlink_pcfiles),
        $fixedlink_pcfiles;
printf STDERR "Non-pooled Toplevel=%d\n", $#nonpooled;
printf STDERR "Non-pooled Other=%d [Valid-pc=%d (Failed-fixes=%d),  
Invalid-pc=%d]\n",
        $unlinked_files, ($unlinked_pcfiles+$unlinkable_pcfiles),
        $unlinkable_pcfiles, $unlinked_nonpcfiles;
printf STDERR "PC files missing attrib entry=%d\n", $missing_attribs;

my ($rsyncnopool, $rsyncnolinks);
if($gzip) {
        $gzip = "-g ";
        $rsyncnopool = "zcat $nopoolfile | rsync -aOH --files-from=- $TopDir 
<newTopDir>";
        $rsyncnolinks = "zcat $nolinksfile | rsync -aOH --files-from=- $TopDir 
<newTopDir>";
} else {
        $gzip = "";
        $rsyncnopool = "rsync -aOH --files-from=$nopoolfile $TopDir 
<newTopDir>";
        $rsyncnolinks = "rsync -aOH --files-from=$nolinksfile $TopDir 
<newTopDir>";
}

print STDERR <<EOF;
------------------------------------------------------------------------------
To complete copy, do the following as user 'root' or 'backuppc':
  1. Copy/rsync over the pool & cpool directories to the new TopDir
     (note this must be done *before* restoring '$linksfile')
     For rsync:
        rsync -aO $TopDir\{pool,cpool\} <newTopDir>

  2. Copy/rsync over the non-pooled top level files ($nopoolfile)
     For rsync:
        $rsyncnopool

  3. Restore the pc directory tree, hard links and zero-sized files:
        $0 -r ${gzip}[-t <newTopDir>] $linksfile

  4. Optionally, copy/rsync over the non-linked pc files ($nolinksfile)
     For rsync:
        $rsyncnolinks

------------------------------------------------------------------------------

EOF
exit;

##############################################################################
##############################################################################
#SUBROUTINES:

sub usage
{
    print STDERR <<EOF;

usage: $0 [options] -o <outfile> [<relpath-to-pcdir> <relpath-to-pcdir>...]
       $0 [options] -r <restorefile> [<relpath-to-pcdir> <relpath-to-pcdir>...]

  where the optional <relpath-to-pcdir> arguments are paths relative to the pc
  tree of form: 'host' or 'host/share' or 'host/share/n' or 'host/share/n/.../'
  If no optional <relpath-to-pcdir> arguments are given then the entire pc tree
  is backed up or restored.
  
  Options: [Common to copy & restore]
   --dryrun|-d            Dry-run - doesn\'t change pool or pc trees
                          Negate with: --nodryrun
   --Force|-F             Overrides various checks (e.g., if BackupPC running
                          or if directories present)
   --gid|-G <gid>         Set group id for backuppc user
                          [Default = $giddef_def]
   --gzip|-g              Pipe files to/from gzip compression
   --topdir|-t            Location of TopDir.
                          [Default = $TopDir_def]
                          Note you may want to change from default for example
                          if you are working on a shadow copy.
   --uid|-U <uid>         Set user id for backuppc user
                          [Default = $uiddef_def]
   --verbose|-v           Verbose (repeat for more verbosity)
                          [Default level = $verbose_def]
                          Use --noverbose to turn off all verbosity
   --help|-h              This help message

  Options: [Copy only]
   --cleanpool|-c         If orphaned files (nlinks=1) found when populating
                          Icache, remove them (and renumber chain as needed)
                          NOTE: Orphans shouldn\'t exist if you have just run
                          BackupPC_nightly
   --fixlinks|-f          Attempt to link valid pc files back to the pool
                          if not already hard-linked
                          NOTE: this changes files in the pc and\/or pool
                          of your source too!
   --outfile|-o [outfile] Required stem name for the 3 output files
                             <outfile>.nopools
                             <outfile>.links
                             <outfile>.nolinks
   --pool|-p  [pool|cpool|both]  Pools to include in Icache tree
                          [Default = $pool_def]
   --Pack|-P              Pack both the indices and values of the InodeCache.
                          This saves approximately 52 bytes per pool entry.
                          For o(1 million) entries, this reduces usage from
                          ~190 bytes/entry to ~140 bytes/entry
   --Readcache|-R [file]  Read in previously written inode pool cache from
                          file. This allows resuming previously started backups
                          Only use this option if the pool has not changed.
                          Note: can\'t be used with --Writecache|-W
   --stdio|-s             Print the directory tree, links and zero-sized files
                          to stdout so it can be piped directly to another copy
                          of the program running --restore
                          NOTE: Status, Warnings, and Errors are printed to 
                          stdout.
   --WriteCache|-W        Write inode pool cache to file <outfile>.icache
                          as a 2 column list consisting of:
                             <inode number> <pool entry>
                          If repeated, include md5sums of the (uncompressed)
                          pool file contents as a 3rd column (this is slow)
                          If argument path is '-', then quit after writing
                          cache.
                          Note: can\'t be used with --Readcache|-R
  Options: [Restore only]
   --Overwrite|-O         OVERWRITE existing files & directories (Dangerous)
   --Skip|-S              SKIP existing backups
   --stdio                Read the directory tree, links and zero-sized files
                          from stdin so it can be piped directly from another
                          copy of the program running in the create mode.
                          Note <restorefile> should not be used in this case
                          For example, the following pipe works:
                          $0 -t <source TopDir> [-g] --stdio | $0 -t <dest 
TopDir> [-g] -r --stdio

OVERVIEW:
  First create an inode-indexed cache (hash) of all pool entries.
  Note the cache may optionaly be written out to a file for later re-use
  with the --Readcache|-R flag.

  Then, recurse through the paths specified relative to the pc tree or if no
  paths specified then the entire pc tree. 
  - If the file is a (non-pooled) top-level log file, then write its path
    relative to pcdir out to <outfile>.nopool
    Note this includes all non-directories above the share level plus the
    backInfo files that are covered by the input paths
  - If the file is a directory, zero length file, or an existing hard link to
    the tree, then write it out to <outfile>.links
  - If the file is not hard-linked but is a valid non-zero length pc file 
    (f-mangled and present in the attrib file) and --fixlinks is selected,
    then try to link it properly to the appropriate pool.
  - Otherwise, add it to <outfile>.nolinks

  NOTE: TO ENSURE INTEGRITY OF RESULTS IT IS IMPORTANT THAT BACKUPPC IS NOT
  RUNNING (use --Force to override)

  NOTE: you should run BackupPC_nightly before running this program so that
  no unnecessary links are backed up; alternatively, set the --cleanpool
  option which will remove orphan pool entries.

EOF
exit(1);
}

#Glob to determine what is being backed up. 
#Make sure each backup seems valid and determine it's compress level
#Collect top-level non-pooled files
sub initialize_backups
{
        if (!@ARGV) { # All backups at backup number level
                # TopDir/pc/<host>/<nn>
                @backups = bsd_glob("${pcdir}*/*"); #2 levels down;
                @nonpooled = grep(! -d , bsd_glob("${pcdir}*")); #All 
non-directories
        } else { # Subset of backups
                return if($Writecache && $ARGV[0] eq '-'); #Don't copy pc tree
                foreach(@ARGV) {
                        my $backupdir = $pcdir . $_ . '/';
                        $backupdir  =~ s|//*|/|g; #Remove redundant slashes
                        $backupdir =~ s|/\.(?=/)||g; #Replace /./ with /
                        die "ERROR: '$backupdir' is not a valid pc tree 
subdirectory\n" if $pcdir eq $backupdir || !-d $backupdir;
                        if($backupdir =~ m|^\Q${pcdir}\E[^/]+/$|) {  #Hostname 
only
                                push(@backups, bsd_glob("${backupdir}*"));
                        } else { # At share level or below #Backup number or 
below
                                push(@backups, ${backupdir});
                        }
                }
                @backups = keys %{{map {$_ => 1} @backups}}; #Eliminate dups
                @nonpooled = keys %{{map {$_ => 1} @nonpooled}}; #Eliminate dups
        }

        push(@nonpooled, grep(! -d, @backups)); #Non-directories
        @backups = grep(-d , @backups); #Directories
        push(@nonpooled, grep(m|/backupInfo$|, @backups)); # backupInfo not 
pooled
    #Note everything else *should* be pooled - if not, we will catch it later 
as an error
        @backups = grep(!m|/backupInfo$|, @backups);

        foreach(@backups) {
                s|/*$||; #Remove trailing slash
                m|^(\Q${pcdir}\E[^/]+/[^/]+)(/)?|;
                die "Backup '$1' does not contain a 'backupInfo' file\n"
                        unless -f "$1/backupInfo";
                push(@nonpooled, "$1/backupInfo") unless $2; #Don't include if 
share level or below
                my $compress = get_bakinfo("$1","compress");
                push(@compresslvls, $compress);
                my $thepool = $compress > 0 ? 'cpool' : 'pool';
                die "Backup '$1' links to non-included '$thepool'\n"
                        if $pool ne 'both' && $pool ne $thepool;
        }
}

#Iterate through the 'pooldir' tree and populate the InodeCache hash
sub populate_icache
{
        my ($fpool) = @_;
        my (@fstat, $dh, @dirlist);
        my $icachesize = 0;

        return 0 unless bsd_glob("$fpool/[0-9a-f]"); #No entries in pool
        print WCACHE "#$fpool\n" if $Writecache;
        my @hexlist = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
'b', 'c', 'd', 'e', 'f');
        my ($idir,$jdir,$kdir);
        print STDERR "\n* Populating Icache with '$fpool' branch:[0-f] " if 
$verbose >=2;
        foreach my $i (@hexlist) {
                print STDERR "$i " if $verbose >=2;
                $idir = $fpool . "/" . $i . "/";
                foreach my $j (@hexlist) {
                        $jdir = $idir . $j . "/";
                        foreach my $k (@hexlist) {
                                $kdir = $jdir . $k . "/";
                                unless(opendir($dh, $kdir)) {
                                        warn "Can't open pool directory: 
$kdir\n" if $verbose>=4;
                                        next;
                                }
                                my @entries = grep(!/^\.\.?$/, readdir ($dh)); 
#Exclude dot files
                                closedir($dh);
                                warn "POOLDIR: $kdir (" . ($#entries+1) ." 
files)\n"
                                        if $verbose >=3;

                                if($cleanpool) { #Remove orphans & renumber 
first and then
                                #create the Icache after orphan deletion & 
chain renumbering
                                #NOTE: we need to do this before populating the 
InodeCache
                                #since pool deletions & renumbering would 
change the cache
                                        my @poolorphans = 
                                                grep(-f $kdir . $_ && 
(stat(_))[3] < 2, @entries);
                                        foreach (sort {poolname2number($b) cmp 
poolname2number($a)}                                                      
@poolorphans) { #Reverse sort to minimize moves
                                                my $pfile = $kdir . $_;
                                                my $result 
=delete_pool_file($pfile);
                                                if($verbose >=1) {
                                                        $result == 1 ?
                                                                warn "WARN: 
Deleted orphan pool entry: $pfile\n": 
                                                                warn "ERROR: 
Couldn't properly delete orphan pool entry: $pfile\n"
                                                }
                                        }
                                        @entries = grep(!/^\.\.?$/, readdir 
($dh))
                                                if(@poolorphans); #Reload if 
orphans renumbered/removed
                                }
                                foreach (@entries) {
                                        # Go through all entries in terminal 
cpool branch
                                        @fstat = stat($kdir . $_);
                                        #Note: 0=dev, 1=ino, 2=mode, 3=nlink, 
4=uid, 5=gid, 6=rdev
                                        #7=size, 8=atime, 9=mtime, 10=ctime, 
11=blksize, 12=blocks
                                        if($fstat[3] >= 2) { # Valid 
(non-orphan) pool entry
                        #Note: No sense in creating icache entries for orphans
                                                &$write_Icache($fpool, 
$fstat[1], $_);
                                                if($Writecache >= 2) {
                                                        printf WCACHE "%d %s 
%s\n", $fstat[1], $_,
                                                        zFile2FullMD5($bpc, 
$md5, $kdir . $_);
                                                }elsif($Writecache) {
                                                        printf WCACHE "%d 
%s\n", $fstat[1], $_;
                                                }
                                                $icachesize++;
                                        } else { #Orphan pool entry
                                                warn "WARN: Orphan pool entry: 
$_\n" 
                                                        if $verbose>=1;
                                        }
                                        warn "WARN: Zero-size pool entry: $_\n"
                                                if $fstat[7] == 0 && 
$verbose>=1;
                                }
                        } # kdir
                } # jdir
        } # idir
        print STDERR "\n" if $verbose >=2;
        return $icachesize;
}

#Recursively go through pc tree
sub find_pool_links
{
        my ($dir, $filename) = @_;

        my $dh;
        my $file = $dir . $filename;
        if(-d $file) {
                my @fstat = stat(_); #Note last stat was -d $file
                #Print info to signal & re-create directory:
                #D <user> <group> <mod> <atime> <mtime> <file name>
                print_file_reference("D", $file, \@fstat);
                $directories++;
                opendir($dh,$file);
                my @contents = readdir($dh);
                closedir($dh);
                foreach (@contents) {
                        next if /^\.\.?$/;     # skip dot files (. and ..)
                        find_pool_links($file . '/', $_); #Recurse
                }
        }
        else { #Not a directory
                my @fstat = stat(_); #Note last stat was -d $file
                $totfiles++;
                if($fstat[7] == 0) { #Zero size file
                        print_file_reference("Z", $file, \@fstat);
                        $zerolengthfiles++;
                        return;
                } elsif($fstat[3] > 1) { #More than one link
                        if(defined(my $pooltarget = &$read_Icache($ptype, 
$fstat[1]))) { 
            #Valid hit: hard link found in InodeCache
                                $existinglink_pcfiles++;
                                $pooltarget =~ /(.)(.)(.)/;
                                
print_hardlink_reference("$ptype/$1/$2/$3/$pooltarget", $file);
                                return;
                        }
                }

                if($file =~ m|^\Q${pcdir}\E[^/]+/[^/]+/backupInfo$|) {
                        $totfiles--;
                        return;         #BackupInfo already taken care of in 
'nonpooled'
                }

                #Non-zero sized file not in InodeCache that is not 'backupInfo'
                if($filename =~ /^f|^attrib$/ && -f $file) {
                        #VALID pc file if f-mangled or attrib file
                        if( $filename =~ /^f/ && 
                                !( -f "$dir/attrib" &&
                                   $attr->read($dir,"attrib") == 1 &&
                                   
defined($attr->get($bpc->fileNameUnmangle($filename))))) {
                                #Attrib file error if attrib file missing or
                                #unmangled name not an element of the attrib 
file
                                warn "ERROR: $file (inode=$fstat[1], 
nlinks=$fstat[3]) VALID pc file with MISSING attrib entry\n";
                                $missing_attribs++;
                        }
                        if($fixlinks) {
                                if(fix_link($file, \@fstat) == 1) {
                                        $fixedlink_pcfiles++;
                                        return;
                                } else {$unlinkable_pcfiles++;}
                        } else{
                                warn "ERROR: $file (inode=$fstat[1], 
nlinks=$fstat[3]) VALID pc file NOT LINKED to pool\n";
                                $unlinked_pcfiles++;
                        }
                } else {
                        warn "ERROR: $file (inode=$fstat[1], nlinks=$fstat[3]) 
INVALID pc file and UNLINKED to pool\n";
                        $unlinked_nonpcfiles++
                }
                #ERROR: Not linked/linkable to pool (and not zero length file 
or directory
                print_file_reference("X", $file, \@fstat);
                printf NOLINKS "%s\n", $file;
                $unlinked_files++
        }
}

#Try to fix link by finding/creating new pool-link
sub fix_link
{
        my ($filename, $fstatptr) = @_;

        my ($plink, @pstat);
        my ($md5sum, $result) = zFile2MD5($bpc, $md5, $filename, 0, 
$compresslvl);
        $result = jMakeFileLink($bpc, $filename, $md5sum, 2, $compresslvl, 
\$plink)
                if $result > 0;
        #Note we choose NewFile=2 since if we are fixing, we always want to 
make the link
        if($result > 0) { #(true if both above calls succeed)
                $plink =~ m|.*(\Q${ptype}\E.*/(.*))|;
                $plink = $1; #Path relative to TopDir
                if($dryrun) {
                        @pstat = @$fstatptr; 
                } else {
                        @pstat  =  stat($plink);
                        &$write_Icache($ptype, $pstat[1], $2);
                }
                print_hardlink_reference($plink, $filename);
                if($verbose >=2) {
                        warn "${DRYRUN}NOTICE: pool entry '$plink' 
(inode=$fstatptr->[1]) missing from Icache and added back\n" if $result == 3;
                        warn sprintf("${DRYRUN}NOTICE: %s (inode=%d, nlinks=%d) 
was fixed by linking to *%s* pool entry: %s (inode=%d, nlinks=%d)\n",
                                                 $filename, $fstatptr->[1], 
$fstatptr->[3],
                                                 ($result == 1 ? "existing" : 
"new"), 
                                                 $plink, $pstat[1], $pstat[3])
                                if $result !=3;
                }
                return 1;
        }
        warn "ERROR: $filename (inode $fstatptr->[1]); fstatptr->[3] links; 
md5sum=$md5sum) VALID pc file FAILED FIXING by linking to pool\n";
        return 0;
}

sub print_hardlink_reference
{
        my ($poolname, $filename) = @_;
        
        #<poolname> <filename>
        printf LINKS "%s %s\n", $poolname, $filename; #Print hard link
}

sub print_file_reference
{
        my ($firstcol, $filename, $fstatptr) = @_;

        #<firstcol>  <user> <group> <mod> <mtime> <filename>
        printf(LINKS "%s %s %s %04o %u %s\n",
                   $firstcol, UID($fstatptr->[4]), GID($fstatptr->[5]), 
                   $fstatptr->[2] & 07777, $fstatptr->[9], $filename);
}

###############################################################################
##Read/write InodeCache

#Note that without packing, InodeCache hash requireas about 186
#bytes/entry ('total_size') to store each entry with about 74 bytes
#required to store the hash structure ('size') alone and an additional
#exactly 112 byte to store the 32-char hex string (this assumes chains
#are sparse enough not to affect the average storage). The 74 bytes
#for the hash structure consists of about 8 bytes for the inode data
#plus 48 bytes for the overhead of the scalar variable storing the
#index plus ~22 bytes for the magic of hash storage and the pointer to
#the hash value. The additional 112 bytes consists of 64 bytes to
#store the unpacked 32-char hex string plus 48 bytes of overhead for
#the scalar variable storing the hash value.

#If we pack the value, we can reduce the hex string storage from 64
#bytes to 16 bytes by storing 2 hex chars per byte. This reduces the
#total storage by 48 bytes/entry from 186 to 138 bytes/entry.

#If we pack the index as an unsigned 32-bit long (V), then we can
#save an additional 4 bytes, reducing the average storage to about 134
#bytes/entry. But, we need to make sure that the highest inode number
#is less than 2^32 (~4.3 billion). If we pack the index as an unsigned
#64-bit quad (Q), then we use 8 bytes for the index data and we are
#back to a total 138 bytes/entry.

#Note these averages are based on a hash size (i.e. pool size) of 1
#million entries. The 74 bytes/entry for the hash structure alone will
#increase (slowly) as the size of the hash increases but the storage
#size for the hash vaue (112 bytes unpacked, 64 packed) does not
#increase.

sub read_Icache_nopack
{
        $InodeCache{$_[0]}{$_[1]}; #No packing of values & indices
}

sub read_Icache_pack
{
#       join("_", unpack("H32V", $InodeCache{$_[0]}{$_[1]})); #Pack values only

        #Also pack indices:
        join("_", unpack("H32V", $InodeCache{$_[0]}{pack("V",$_[1])}));#32-bit 
indices
#       join("_", unpack("H32V", $InodeCache{$_[0]}{pack("Q",$_[1])}));#64-bit 
indices
        #Note: shouldn't need 64 bits for inodes since they are ino_t
        #which are typdef'd as longs
}

sub write_Icache_nopack
{
        $InodeCache{$_[0]}{$_[1]} = $_[2]; #No packing of values & indices
}

sub write_Icache_pack
{
        my ($pool, $index, $value) = @_;

        $value =~ m|(.{32})(.(.*))?|; 
        #Pack values only
#       $InodeCache{$pool}{$index}= defined($3) ? pack("H32V", $1, $3) : 
pack("H32", $1);

        #Also pack indices:
        $InodeCache{$pool}{pack("V",$index)} = defined($3) ? pack("H32V", $1, 
$3) : pack("H32", $1); #32-bit indices

#       $InodeCache{$pool}{pack("Q",$index)} = defined($3) ? pack("H32V", $1, 
$3) : pack("H32", $1); #64-bit indices
        #Note shouldn't need 64 bits for inodes since they are ino_t which
        #are typdef'd as longs
}

###############################################################################
# Return user name corresponding to numerical UID with caching
sub UID
{
        unless(exists($UIDcache{$_[0]})) {
                unless(defined($UIDcache{$_[0]} = getpwuid($_[0]))) {
                        printf STDERR "Warning: '%d' not found in password 
file, writing numerical UID instead...\n", $_[0];
                        $UIDcache{$_[0]} = $_[0];
                }
        }
    return $UIDcache{$_[0]};
}

# Return group name corresponding to numerical GID with caching
sub GID
{
        unless(exists($GIDcache{$_[0]})) {
                unless(defined($GIDcache{$_[0]} = getgrgid($_[0]))) {
                        printf STDERR "Warning: '%d' not found in group file, 
writing numerical GID instead...\n", $_[0];
                        $GIDcache{$_[0]} = $_[0];
                }
        }
    return $GIDcache{$_[0]};
}

# Return numerical UID corresponding to user name with caching
sub USER
{
        unless(exists($USERcache{$_[0]})) {
                unless(defined($USERcache{$_[0]} = getpwnam($_[0]))) {
                        printf STDERR "Warning: '%s' not found in password 
file, writing default backuppc UID (%d) instead...\n", $_[0], $uiddef;
                        $USERcache{$_[0]} = $uiddef;
                }
        }
    return $USERcache{$_[0]};
}

# Return numerical GUID coresponding to group name with caching
sub GROUP
{
        unless(exists($GROUPcache{$_[0]})) {
                unless(defined($GROUPcache{$_[0]} = getgrnam($_[0]))) {
                        printf STDERR "Warning: '%s' not found in group file, 
writing default backuppc GID (%d) instead...\n", $_[0], $giddef;
                        $GROUPcache{$_[0]} = $giddef;
                }
        }
    return $GROUPcache{$_[0]};
}
###############################################################################
###############################################################################
sub do_restore
{
        my $currbackup = "";
        my $formaterr = 0;
        my $ownererr = 0;
        my $permserr = 0;
        my $mkdirerr = 0;
        my $mkzeroerr = 0;
        my $mklinkerr = 0;
        my $filexsterr = 0;
        my $utimerr = 0;
        my $newdir = 0;
        my $newzero = 0;
        my $newlink = 0;
        my $skipped = 0;

###############################################################################
        if($stdio) {
                open(LINKS, $gzip ? "/bin/zcat - |" : "<& STDIN");
        } else {
                my $restorefile = shift @_;
                open(LINKS, $gzip ? "/bin/zcat $restorefile |" : "< 
$restorefile") or
                        die "ERROR: Can't open '$restorefile' for 
reading!($!)\n";
        }
        my @backuplist = @_;
###############################################################################
        chdir($TopDir); #Do this so we don't need to worry about distinguishing
                    #between absolute and relative (to TopDir) pathnames
###############################################################################
        die "ERROR: pool directories empty!\n"
                unless $Force || bsd_glob("{cpool,pool}/*");

        my(@skiplist, $skiplist); #Backups to skip...
        if($Skip) {
                @skiplist = grep(@{[bsd_glob("$_/f*")]}, 
bsd_glob("${pcdir}*/[0-9]*"));
                #NOTE: glob must be evaulated in list context because 
stateful...
        } elsif(!$Overwrite && grep(-d, bsd_glob("${pcdir}*/[0-9]*/f*"))) {
                die "ERROR: pc directory contains existing backups!\n(use 
--Skip to skip or --Overwrite to OVERWRITE)\n"
        }
        $skiplist = "\Q" . join("\E|\Q", @skiplist) . "\E" if @skiplist;

        my $backuplist; #Backups to selectively backup...
        $backuplist = "\Q" . join("\E|\Q", @backuplist) . "\E" if @backuplist;

        umask 0000; #No permission bits disabled
        my $time = time; #We will use this for setting atimes.
        my @dirmtimes =(); #Stack consisting of paired "dir, mtime" entries
        my $matchback = 1;
        my ($line);

LINE:   while($line = <LINKS>) {
                chomp $line;

                unless($line =~ m|^[a-f0-9DZX#]|) { #First character test
                        print STDOUT "ERR_CHAR1: $line\n";
                        warn sprintf("ERROR: Illegal first line character: 
%s\n", 
                                                 substr($line,0,1))
                                if $verbose >=1;
                        next LINE; #NOTE: next without line would go to next 
switch case
                }

                switch ($&) { #Go through cases...
                        case 'D' {
                                unless($line =~ m|^D +([^ ]+) +([^ ]+) +([^ ]+) 
+([^ ]+) +(\Q${pcdir}\E.*)|) {
                                        print STDOUT "ERR_DFRMT $line\n";
                                        $formaterr++;
                                        next LINE;
                                }
                                #NOTE: 1=uname 2=group 3=mode 4=mtime 5=dirpath
                                print STDERR "$1|$2|$3|$4|$5|\n" if $verbose 
>=4;
                                my $user = USER($1);
                                my $group = GROUP($2);
                                my $mode = oct($3);
                                my $mtime = $4;
                                my $dir = $5;  $dir  =~ s|/*$||; #Remove 
trailing slash
                                $dir =~ m|(.*)/|;
                                my $pdir = $1; #parent dir

                                #Don't restore if on skiplist or if not on 
backuplist
                                if(($skiplist && $dir =~ m|^($skiplist)|) ||
                                   ($backuplist && $dir !~ 
m|^pc/($backuplist)|)) {
                                        $matchback = 0;
                                        next LINE;
                                } else {
                                        $matchback = 1;
                                }

                                if($verbose >= 1) {
                                        $dir =~ m|pc/([^/]*/[^/]*)|;
                                        if($1 ne $currbackup) {
                                                $currbackup = $1;
                                                warn "RESTORING: $currbackup\n";
                                        }
                                }

                                #Look at @dirmtimes stack to see if we have 
backed out of
                                #top directory(ies) on stack and can now set 
mtime
                                my $lastdir; #If in new part of tree, set 
mtimes for past dirs
                                while(defined($lastdir = shift(@dirmtimes)) && 
                                          $dir !~ m|^\Q$lastdir\E|) {
                                        jutime($time, shift(@dirmtimes), 
$lastdir);
                                }
                                unshift(@dirmtimes, $lastdir) if $lastdir; #Put 
back last one

                                if(! -e $dir) { #Make directory (nothing in the 
way)
                                        unless( -d $pdir ||  jmake_path $pdir){ 
                                                #Make parent directory if 
doesn't exist
                                                #NOTE: typically shouldn't be 
necesary since list is 
                                                #created with directories along 
the way
                                                print STDOUT "ERR_MKDIR 
$line\n";
                                                $mkdirerr++;
                                                next LINE;
                                        }
                                        #Make directory & set perms
                                        unless(mkdir $dir, $mode) {
                                                print STDOUT "ERR_MKDIR 
$line\n";
                                                $mkdirerr++;
                                                next LINE;
                                        }
                                } elsif(-d $dir) { #Directory already exists
                                        #Set perms
                                        unless(jchmod $mode, $dir) {
                                                print STDOUT "ERR_PERMS 
$line\n";
                                                $permserr++;
                                                next LINE;
                                        }
                                } else { #Non-directory in the way, abort line
                                        print STDOUT "ERR_DEXST $line\n";
                                        $filexsterr++;
                                        next LINE;
                                }

                                #Set ownership
                                unless(jchown $user, $group, $dir){
                                                print STDOUT "ERR_OWNER 
$line\n";
                                                $ownererr++;
                                                next LINE;
                                }
                                #Put dir mtime on stack
                                unshift(@dirmtimes, $mtime); #We need to set 
dir mtime
                                unshift(@dirmtimes, $dir);   #when done adding 
files to dir

                                $newdir++;
                        }

                        case 'Z' {
                                next LINE unless $matchback;
                                unless($line =~ m|^Z +([^ ]+) +([^ ]+) +([^ ]+) 
+([^ ]+) +((\Q${pcdir}\E.*/)(.*))|) {
                                        print STDOUT "ERR_ZFRMT $line\n";
                                        $formaterr++;
                                        next LINE;
                                }
                                #NOTE: 1=uname 2=group 3=mode 4=mtime 
5=fullpath 6=dir 7=file
                                print STDERR "$1|$2|$3|$4|$5|$6|$7\n" if 
$verbose >=4;
                                my $user = USER($1);
                                my $group = GROUP($2);
                                my $mode = oct($3);
                                my $mtime = $4;
                                my $file = $5;
                                my $dir = $6;
                                my $name = $7;

                                #Check if file exists and not overwritable
                                if(-e $file && !($Overwrite && -f $file && 
unlink($file))) {
                                        print STDOUT "ERR_ZEXST $line\n";
                                        $filexsterr++;
                                        next LINE;
                                }

                                unless( -d $dir || jmake_path $dir){ #Make dir 
if doesn't exist
                                        #NOTE: typically shouldn't be necesary 
since list is
                                        #created with directories along the way
                                        print STDOUT "ERR_MKDIR $line\n";
                                        $mkdirerr++;
                                        next LINE;
                                }
                                #Create zero-length file with desired perms
                                unless(sysopen(ZERO, $file, $CREATMASK, $mode) 
&& close(ZERO)){
                                        print STDOUT "ERR_MKZER $line\n";
                                        $mkzeroerr++;
                                        next LINE;
                                }
                                unless(jchown $user, $group, $file){ #Set 
ownership
                                        print STDOUT "ERR_OWNER $line\n";
                                        $ownererr++;
                                        next LINE;
                                }
                                unless(jutime $time, $mtime, $file) { #Set 
timestamps
                                        $utimerr++;
                                        next LINE;
                                }
                                $newzero++;
                        }

                        case 'X' {$skipped++; next LINE; }  #Error line

                        case '#' {next LINE; }  #Comment line

                        else { #Hard link is default switch case
                                next LINE unless $matchback;
                                unless($line =~ 
m|(c?pool/[0-9a-f]/[0-9a-f]/[0-9a-f]/[^/]+) +((\Q${pcdir}\E.*/).*)|) {
                                        print STDOUT "ERR_LFRMT $line\n";
                                        $formaterr++;
                                        next LINE;
                                }
                                print STDERR "$1|$2|$3\n" if $verbose >=4; # 
1=source 2=target 3=targetdir
                                my $source = $1;
                                my $target = $2;
                                my $targetdir = $3;

                                if(-e $target && 
                                   !($Overwrite && -f $target && 
unlink($target))){
                                        print STDOUT "ERR_LEXST $line\n"; 
                                        $filexsterr++; #Target exists and not 
overwritable
                                        next LINE;
                                }

                                unless(-d $targetdir || jmake_path $targetdir){
                                        #Make targetdir if doesn't exist
                                        #NOTE: typically shouldn't be necesary 
since list is
                                        #created with directories along the way
                                        print STDOUT "ERR_MKDIR $line\n";
                                        $mkdirerr++;
                                        next LINE;
                                }

                                unless(jlink $source, $target) { #Make link
                                        print STDOUT "ERR_MKLNK $line\n";
                                        $mklinkerr++;
                                        next LINE;
                                }
                                $newlink++
                        }
                } #SWITCH end
        } # WHILE reading lines

        #Set mtimes for any remaining directories in @dirmtimes stack
        while(my $dir = shift(@dirmtimes)){
                jutime($time, shift(@dirmtimes), $dir);
        }

###############################################################################
        #Print results:
        

        printf("\nSUCCESSES: Restores=%d [Dirs=%d Zeros=%d Links=%d] 
Skipped=%d\n", 
                   ($newdir+$newzero+$newlink), $newdir, $newzero, $newlink, 
$skipped);
        printf("ERRORS: TOTAL=%d Format=%d Mkdir=%d Mkzero=%d Mklink=%d\n",
                   ($formaterr + $mkdirerr + $mkzeroerr + $mklinkerr +
                        $filexsterr + $ownererr + $permserr + $utimerr),
                   $formaterr, $mkdirerr, $mkzeroerr, $mklinkerr);
        printf("        File_exists=%d Owner=%d Perms=%d Mtime=%d\n\n",
                   $filexsterr, $ownererr, $permserr, $utimerr);

        exit; #RESTORE done
}
###############################################################################

------------------------------------------------------------------------
jLib.pm

#============================================================= -*-perl-*-
#
# BackupPC::jLib package
#
# DESCRIPTION
#
#   This library includes various supporting subroutines for use with BackupPC
#   functions used by BackupPC.
#   Some of the routines are variants/extensions of routines originally written
#   by Craig Barratt as part of the main BackupPC release.
#
# AUTHOR
#   Jeff Kosowsky
#
# COPYRIGHT
#   Copyright (C) 2008-2013  Jeff Kosowsky
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; either version 2 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
#========================================================================
#
# Version 0.4.1, released January 2013

# CHANGELOG
# 0.4.0 (Jan 2011)
# 0.4.1 (Jan 2013) Added MD52RPath, is_poollink
#                  Added jchmod, jchown
# 0.4.2 (Feb 2013) Made write_attrib more robust
#
#========================================================================

package BackupPC::jLib;

use strict;
use vars qw($VERSION);
$VERSION = '0.4.1';

use warnings;
use File::Copy;
use File::Glob ':glob';
use File::Path;
use File::Temp;
use Fcntl;  #Required for RW I/O masks

use BackupPC::Lib;
use BackupPC::Attrib;
use BackupPC::FileZIO;
use Data::Dumper;  #Just used for debugging...

no utf8;

use constant _128KB               => 131072;
use constant _1MB                 => 1048576;
use constant LINUX_BLOCK_SIZE     => 4096;
use constant TOO_BIG              => 2097152; # 1024*1024*2 (2MB)

require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(
       %Conf $dryrun $errorcount
       LINUX_BLOCK_SIZE TOO_BIG
       printerr warnerr firstbyte
       zFile2MD5 zFile2FullMD5
       link_recursively_topool
       jfileNameUnmangle
       getattr read_attrib count_file_attribs
       get_bakinfo get_file_attrib get_attrib_value get_attrib_type
       write_attrib write_file_attrib
       attrib
       MD52RPath
       is_poollink GetPoolLink jMakeFileLink
       poolname2number renumber_pool_chain delete_pool_file
       run_nightly
       jcompare zcompare zcompare2
       delete_files jtouch
       jchmod jchown jcopy jlink junlink jmkdir jmkpath jmake_path 
       jrename jrmtree jutime
);

#Global variables
our %Conf;
our $errorcount=0;
our $dryrun=1;  #global variable - set to 1 to be safe -- should be set to
                                #0 in actual program files if you don't want a 
dry-run
                #None of the routines below will change/delete/write actual
                #file data if this flag is set. The goal is to do everything but
                #the final write to assist with debugging or cold feet :)

sub printerr
{
        print "ERROR: " . $_[0];
        $errorcount++;
}

sub warnerr
{
        $|++; # flush printbuffer first
        warn "ERROR: " . $_[0];
        $errorcount++;
}

# Returns the firstbyte of a file.
# If coding $coding undefined or 0, return as unpacked 2 char hexadecimal
# string. Otherwise, return as binary byte.
# Return -1 on error.
# Useful for checking the type of compressed file/checksum coding.
sub firstbyte {
        my ($file, $coding) = @_;
        my $fbyte='';
        sysopen(my $fh, $file, O_RDONLY) || return -1;
        $fbyte = -1 unless sysread($fh, $fbyte, 1) == 1;
        close($fh);
        if (! defined($coding) || $coding == 0) {
                $fbyte = unpack('H*',$fbyte); # Unpack as 2 char hexadecimal 
string
        }
        else {
                $fbyte = vec($fbyte, 0, 8);  # Return as binary byte
        }
        return $fbyte;
}

# Compute the MD5 digest of a compressed file. This is the compressed
# file version of the Lib.pm function File2MD5.
# For efficiency we don't use the whole file for big files
#   - for files <= 256K we use the file size and the whole file.
#   - for files <= 1M we use the file size, the first 128K and
#     the last 128K.
#   - for files > 1M, we use the file size, the first 128K and
#     the 8th 128K (ie: the 128K up to 1MB).
# See the documentation for a discussion of the tradeoffs in
# how much data we use and how many collisions we get.
#
# Returns the MD5 digest (a hex string) and the file size if suceeeds.
# (or "" and error code if fails).
# Note return for a zero size file is ("", 0).
#
# If $size < 0 then calculate size of file by fully decompressing
# If $size = 0 then first try to read corresponding attrib file
#    (if it exists), if doesn't work then calculate by fully decompressing
# IF $size >0 then use that as the size of the file
#
# If compreslvl is undefined then use the default compress level from 
# the config file

sub zFile2MD5
{
    my($bpc, $md5, $name, $size, $compresslvl) = @_;
        
        my ($fh, $rsize, $filesize, $md5size);

        return ("", -1) unless -f $name;
        return ("", 0) if (stat(_))[7] == 0;  #Zero size file
        $compresslvl = $Conf{CompressLevel} unless defined $compresslvl;
        unless (defined ($fh = BackupPC::FileZIO->open($name, 0, 
$compresslvl))) {
                printerr "Can't open $name\n";
                return ("", -1);
        }

        my ($datafirst, $datalast);
        my @data;
        #First try to read up to the first 128K (131072 bytes)
        if ( ($md5size = $fh->read(\$datafirst, _128KB)) < 0 ) { #Fist 128K
                printerr "Can't read & decompress $name\n";
                return ("", -1);
        }

        if ($md5size == _128KB) { # If full read, continue reading up to 1st MB
                my $i=0;
                #Read in up to 1MB (_1MB), 128K at a time and alternate between 
2 data buffers
                while ( ($rsize = $fh->read(\$data[(++$i)%2], _128KB)) == _128KB
                                &&  ($md5size += $rsize) < _1MB ) {}
                $md5size +=$rsize if $rsize < _128KB; # Add back in partial read
            $datalast = ($i > 1 ? 
                                         substr($data[($i-1)%2], $rsize, 
_128KB-$rsize) : '')
                        . substr($data[$i%2], 0 ,$rsize); #Last 128KB (up to 
1MB)
    }

        $size = 0 unless defined $size;
        if($size > 0) {  #Use given size
                $filesize = $size;
        }elsif($md5size < _1MB) { #Already know the size because read it all 
(note don't do <=)
                $filesize = $md5size;
        } elsif($compresslvl == 0) { #Not compressed, so: size = actual size
                $filesize = -s $name;
        }elsif($size == 0) { # Try to find size from attrib file
                $filesize = get_attrib_value($name, "size");
                if(!defined($filesize)) {
                        warn "Can't read size of $name from attrib file so 
calculating manually\n";
                }
        }
        if(!defined($filesize)) { #No choice but continue reading to find size
                $filesize = $md5size;
                while (($rsize = $fh->read(\($data[0]), _128KB)) > 0) {
                        $filesize +=$rsize;
        }
   }
   $fh->close();

   $md5->reset();
   $md5->add($filesize);
   $md5->add($datafirst);
   $md5->add($datalast) if defined($datalast);
   return ($md5->hexdigest, $filesize);
}

#Compute md5sum of the full data contents of a file.
#If the file is compressed, calculate the md5sum of the inflated
#version (using the zlib routines to uncompress the stream). Note this
#gives the md5sum of the FULL file -- not just the partial file md5sum
#above.
sub zFile2FullMD5
{
    my($bpc, $md5, $name, $compresslvl) = @_;

        my $fh;
        my $data;

        $compresslvl = $Conf{CompressLevel} unless defined $compresslvl;
        unless (defined ($fh = BackupPC::FileZIO->open($name, 0, 
$compresslvl))) {
                printerr "Can't open $name\n";
                return -1;
        }

        $md5->reset();  
        while ($fh->read(\$data, 65536) > 0) {
                $md5->add($data);
        }

    return $md5->hexdigest;
}

# Like MakeFileLink but for existing files where we don't have the
# digest available. So compute digest and call MakeFileLink after
# For each file, check if the file exists in $bpc->{TopDir}/pool.
# If so, remove the file and make a hardlink to the file in
# the pool.  Otherwise, if the newFile flag is set, make a
# hardlink in the pool to the new file.
#
# Returns 0 if a link should be made to a new file (ie: when the file
#    is a new file but the newFile flag is 0).
# Returns 1 if a link to an existing file is made,
# Returns 2 if a link to a new file is made (only if $newFile is set)
# Returns negative on error.
sub zMakeFileLink
{
    my($bpc, $md5, $name, $newFile, $compress) = @_;

        $compress = $Conf{CompressLevel} unless defined $compress;

        my ($d,$ret) = zFile2MD5($bpc, $md5, $name, 0, $compress);
        return -5 if $ret < 0;
        $bpc->MakeFileLink($name, $d, $newFile, $compress);
}


# Use the following to create a new file link ($copy) that links to
# the same pool file as the original ($orig) file does (or
# should). i.e. this doesn't assume that $orig is properly linked to
# the pool. This is computationally more costly than just making the
# link, but will avoid further corruption whereby you get archive
# files with multiple links to each other but no link to pool.

# First, check if $orig links to the pool and if not create a link
# via MakeFileLink.  Don't create a new pool entry if newFile is zero.
# If a link either already exists from the original to the pool or if
# one was successfully created, then simply link $copy to the same
# pool entry.  Otherwise, just copy (don't link) $orig to $copy
# and leave it unlinked to the pool.  

# Note that checking whether $orig is linked to the pool is
# cheaper than running MakeFileLink since we only need the md5sum
# checksum.
# Note we assume that the attrib entry for $orig (if it exists) is
# correct, since we use that as a shortcut to determine the filesize 
# (via zFile2MD5)
# Returns 1 if a link to an existing file is made,
# Returns 2 if a link to a new file is made (only if $newFile is set)
# Returns 0 if file was copied (either because MakeFileLink failed or 
#   because newFile=0 and no existing pool match
# Returns negative on error.
sub zCopyFileLink 
{
        my ($bpc, $orig, $copy, $newFile, $compress) = @_;
        my $ret=1;
        $compress = $Conf{CompressLevel} unless defined $compress;
        my $md5 = Digest::MD5->new;
        my ($md5sum, $md5ret) = zFile2MD5($bpc, $md5, $orig, 0, $compress);

    #If $orig is already properly linked to the pool (or linkable to pool after 
running 
        #MakeFileLink on $orig) and HardLinkMax is not exceeded, then just link 
to $orig.
        if($md5ret > 0) { #If valid md5sum and non-zero length file (so should 
be linked to pool)...
                if((GetPoolLinkMD5($bpc, $orig, $md5sum, $compress, 0) == 1 || 
#If pool link already exists
                        ($ret = $bpc->MakeFileLink($orig, $md5sum, $newFile, 
$compress))> 0) #or creatable by MakeFileLink
                   && (stat($orig))[3] < $bpc->{Conf}{HardLinkMax}) {     # AND 
(still) less than max links
                        return $ret if link($orig, $copy); # Then link from 
copy to orig
                }
        }
        if(copy($orig, $copy) == 1) { #Otherwise first copy file then try to 
link the copy to pool
                if($md5ret > 0 && ($ret = $bpc->MakeFileLink($copy, $md5sum, 
$newFile, $compress))> 0) {
                        return 2; #Link is to a new copy
                }
                printerr "Failed to link copied file to pool: $copy\n"; 
                return 0; #Copy but not linked
        }
        die "Failed to link or copy file: $copy\n";     
        return -1;
}

# Copy $source to $target, recursing down directory trees as
# needed. The 4th argument if non-zero, means (for files) use
# zCopyFileLink to make sure that everything is linked properly to the
# pool; otherwise, just do a simple link. The 5th argument $force,
# will erase a target directory if already present, otherwise an error
# is signalled. The final 6th argument is the CompressionLevel which
# can be left out and will the be calculated from
# bpc->Conf{CompressLevel}

# Note the recursion is set up so that the return is negative if
# error, positive if consistent with a valid pool link (i.e. zero
# length file or directories are consistent too), and zero if
# successful but not consistent with a valid pool.  The overall result
# is -1 if there is any error and otherwise the AND'ed result of the
# operations. That means if the overall result is positive then the
# whole tree is successfully linked to the pool, so the next time
# around we can use a simple linking (i.e $linkcheck=0).  Note $source
# and $target must be full paths from root.  Note: if files are not
# compressed properly then you won't be able to link them to pool.
sub link_recursively_topool
{
    my ($bpc, $source, $target, $linkcheck, $force, $compress) = @_;
        my $ret=1;
        die "Error: '$source' doesn't exist" unless -e $source;
        if (-e $target) {
                die "Error can't overwrite: $target (unless 'force' set)\n" 
unless $force;
                die "Error can't remove: $target ($!)\n"  unless 
rmtree($target, 0, 0);
        }
        if (-d $source) {
                die "Error: mkdir failed to create new directory: $target 
($!)\n"
                        unless jmkdir($target);
                opendir( my $dh, $source) || die "Error: Could not open dir 
$source ($!)\n";
                foreach my $elem (readdir($dh)) {
            next if /^\.\.?$/;     # skip dot files (. and ..)
                        my $newsource = "$source/$elem";
                        my $newtarget = "$target/$elem";
            my $newret = link_recursively_topool($bpc, $newsource, $newtarget, 
$linkcheck, $force, $compress);
                        if ($newret < 0) { #Error so break out of loop & return
                                closedir $dh;
                                return -1 
                        }
                        $ret = 0 if $newret == 0; #note $ret stays at 1 only if 
no elements return -1 or 0
        }
                closedir $dh;
                return $ret;
        }
        elsif ($dryrun) {return 1} # Return before making changes to filesystem
        elsif ( ! -s $source) { # zero size
                copy($source, $target); #No link since zero size
        }
        elsif ($linkcheck) { #Makes sure first that source properly linked to 
pool
                return(zCopyFileLink($bpc, $source, $target, 1, $compress));
        }
        else {#Otherwise, OK to perform simple link to source
                return (link($source, $target) == 1 ? 1 : -1)
        }
}

sub get_bakinfo
{
        my ($bakinfofile, $entry) = @_;
        our %backupInfo = ();

        $bakinfofile .= "/backupInfo";
        warn "Can't read $bakinfofile\n" unless -f $bakinfofile;

        unless (my $ret = do $bakinfofile) { # Load  the backupInfo file
                if ($@) {
                        warn "couldn't parse $bakinfofile: $@\n";
                }
                elsif (!defined $ret) {
                        warn "couldn't do $bakinfofile: $!\n";
                }
                elsif (! $ret) {
                        warn "couldn't run $bakinfofile\n";
                }
        }
        my $value = $backupInfo{$entry};
        warn "$bakinfofile is empty or missing entry for '$entry'\n"  
                unless defined $value;
        return $value;
}

# Note: getattr($attr) =$attr->{files} 
#       getattr($attr, $file) =$attr->{files}{$file}
#       getattr($attr, $file, $attribute)  =$attr->{files}{$file}{$attribute}

sub getattr
{
    my($attr, $fileName, $Attribute) = @_;
    return $attr->{files}{$fileName}{$Attribute} if ( defined($Attribute) );
    return $attr->{files}{$fileName} if ( defined($fileName) );
    return $attr->{files};
}


#Reads in the attrib file for directory $_[1] and (optional alternative 
#attrib file name $_[2]) and #stores it in the hashref $_[0] passed to 
#the function
#Returns -1 and a blank $_[0] hash ref if attrib file doesn't exist 
#already (not necessarily an error)
#Dies if attrib file exists but can't be read in.
sub read_attrib
{ 
#Note: $_[0] = hash reference to attrib object
#SO DO NOT USE LOCAL VARIABLE FOR IT (i.e. don't do my $attr=$_[0]
        $_[0] = BackupPC::Attrib->new({ compress => $Conf{CompressLevel} });

#       unless (defined $_[1]) { #JJK: DEBUGGING
#               print "read_attrib: \$_[1] undefined\n";
#               print Dumper @_;
#       }
        return -1 unless -f attrib($_[1], $_[2]);
    #This is not necessarily an error because dir may be empty

        $_[0]->read($_[1],$_[2]) or
                die "Error: Cannot read attrib file: " . attrib($_[1],$_[2]) . 
"\n";

        return 1;
}

#Same as Lib.pm fileNameUnmangle but doesn't require
#unneccessary '$bpc'
sub jfileNameUnmangle {
    my($name) = @_;

    $name =~ s{/f}{/}g;
    $name =~ s{^f}{};
    $name =~ s{%(..)}{chr(hex($1))}eg;
    return $name;
}
        
sub count_file_attribs
{
        my ($attrib) = @_;
        return( scalar (keys (%{$attrib->get()})));
}

# Get attrib entry for $fullfilname. The corresponding hash is both returned and
# also fills the hash reference (optionally) passed via $fileattrib.
# If attrib file not present, return -1 (which may not be an error)
# Returns -2 if not a mangled file
# Dies if error
sub get_file_attrib
{
        my ($fullfilename, $fileattrib) = @_;
        $fullfilename =~ m{(.+)/(.+)};  #1=dir; $2=file
        return -2 unless defined $2;

        return -1 if read_attrib(my $attr, $1) < 0;

        %{$fileattrib} =  %{$attr->{files}{jfileNameUnmangle($2)}};
        #Note unmangling removes initial 'f' AND undoes char mapping
}

# Returns value of attrib $key for $fullfilename (full path)
# If not a mangled file or attrib file not present or there is not an
# entry for the specificed key for the given file, then return 'undef'
sub get_attrib_value
{
        my ($fullfilename, $key) = @_;
        $fullfilename =~ m{(.+)/(.+)};  #1=dir; $2=file

        return undef unless defined $2;
        return undef if read_attrib(my $attr, $1) < 0;
        return $attr->{files}{jfileNameUnmangle($2)}{$key}; 
    #Note this returns undefined if key not present
        #Note unmangling removes initial 'f' AND undoes char mapping
}

# Returns value of attrib type key for $fullfilename (full path)
# If attrib file present but filename not an entry, return -1 [not an error if 
file nonexistent]
# If no attrib file (but directory exists), return -2 [not an error if 
directory empty]
# If directory non-existent, return -3
# If attrib file present but not readble, return -4 [true error]
# Note there may an entry even if file non-existent (e.g. type 10 = delete)
sub get_attrib_type
{
        my ($fullfilename) = @_;
        $fullfilename =~ m{(.+)/(.+)};  #1=dir; $2=file

#       unless (defined $1) { #JJK: DEBUGGING
#               print "get_attrib_type: \$1 undefined\n";
#               print Dumper @_;
#       }

        return -3 unless -d $1;
        return -2 unless -f attrib($1);
        return -4 unless read_attrib(my $attr, $1) >= 0;
        my $type = $attr->{files}{jfileNameUnmangle($2)}{type};
        #Note unmangling removes initial 'f' AND undoes char mapping
        return (defined($type) ? $type : -1);
}

# Write out $attrib hash to $dir/attrib (or $dir/$attfilename) and
# link appropriately to pool.
# Non-zero 4th argument $poollink creates link to pool (using MakeFileLink)
# after writing the attrib file.
# 5th argument tells what to do if no files in $attrib 
# (0=return error, 1=delete attrib file and return success)
# 6th argument is an optional alternative name for the attrib file itself
# Note does an unlink first since if there are hard links, we don't want
# to modify them
# Returns positive if successful, 0 if not
# Specifically, 1 if linked to existing, 2 if linked to new, 
# 3 if written without linking, 4 if (empty) attrib file deleted

sub write_attrib
{
        my ($bpc, $attrib, $dir, $poollink, $delempty, $attfilename) = @_; 
        die "Error: Cannot write to directory: $dir\n" unless -w $dir;

#       unless (defined $dir) { #JJK: DEBUGGING
#               print "write_attrib: \$dir undefined";
#               print Dumper @_;
#       }

        my $attfilepath = attrib($dir, $attfilename);

        if(count_file_attribs($attrib) == 0 && -e $attfilepath && $delempty) {
                #Delete existing empty attrib file
                die "Error: could not unlink empty attrib file: $attfilepath\n" 
 
                        unless(junlink($attfilepath));
                return 4;
        }

        #First (safely) write out new attrib file to directory
        my $tempname = mktemp("attrib.XXXXXXXXXXXXXXXX");
        unless($attrib->write($dir, $tempname) == 1 &&
                   (! -e $attfilepath || junlink($attfilepath)) &&
                   jrename("$dir/$tempname", $attfilepath)) {
                unlink("$dir/$tempname") if -e "$dir/$tempname";
                die "Error: could not write attrib file: $attfilepath\n";
        }
        unlink("$dir/$tempname") if $dryrun;

        my $ret=3;
        if ($poollink) {
                my $data = $attrib->writeData;
                my $md5 = Digest::MD5->new;
                my $digest;
                if(($digest = $bpc->Buffer2MD5($md5, length($data), \$data)) eq 
-1
                   || ($ret = jMakeFileLink($bpc, $attfilepath, $digest, 2, 
                                                                        
$Conf{CompressLevel})) <= 0) {
                        printerr "Can't link attrib file to pool: $attfilepath 
($ret)\n";
                }
        }
        return $ret;
}

# Write out $fileattrib for $file (without the mangle) to $dir/$attfilename (or
# to the default attribute file for $dir if $attfilename is undef)
# Reads in existing attrib file if pre-existing
# 4th argument $poollink says whether to write to file (0) or link to
# pool (using MakeFileLink).
# Returns positive if successful, 0 if not
# Specifically, 1 if linked to existing, 2 if linked to new,
# 3 if written without linking
sub write_file_attrib
{
    my ($bpc, $dir, $file, $fileattrib, $poollink, $attfilename) = @_; #Note 
$fileattrib is a hashref
        my $ret=0;

        read_attrib(my $attr, $dir, $attfilename); #Read in existing attrib 
file if it exists
        $ret = write_attrib($bpc, $attr, $dir, $poollink, 0, $attfilename) 
                if $attr->set($file, $fileattrib) > 0;

#       unless (defined $dir) { #JJK: DEBUGGING
#               print "write_file_attrib: \$dir undefined\n";
#               print Dumper @_;
#       }

        die "Error writing to '$file' entry to attrib file: " . attrib($dir, 
$attfilename) . "\n" unless $ret > 0;
        return $ret;
}

sub attrib
{
        return (defined($_[1]) ? "$_[0]/$_[1]" : "$_[0]/attrib");
}

#
# Given an MD5 digest $d and a compress flag, return the relative
# path (to TopDir)
#
sub MD52RPath
{
    my($bpc, $d, $compress, $poolDir) = @_;

    return if ( $d !~ m{(.)(.)(.)(.*)} );
    $poolDir = ($compress ? "cpool" : "pool")
                    if ( !defined($poolDir) );
    return "$poolDir/$1/$2/$3/$1$2$3$4";
}

# Modified version of MakeFileLink including:
# 1. Efficiency/clarity improvements
# 2. Calls GetPoolLink to find candidate link targets.
# 2. For non-compressed files, uses my improved jcompare comparison algorithm
# 3. For compressed files, uses zcompare2 which compares only the compressed
#    data sans first-byte header & potential rsync digest trailer. This allows
#    for matches even if one file has rsync digest and other does not
# 4. Moves file before unlinking in case subsequent link fails and needs to be 
#    undone
# 5. Added 6th input parameter to return pointer to the pool link name
# 6. Extended meaning of newFile flag
#      0 = Don't creat new pool file (as before)
#      1 = Create new pool file IF no other links to source file
#          (this was the previous behavior for whenever newFile was set)
#      >2 = Create new pool file EVEN if source file has more than one link
#          (this will by extension link other things linked to the source
#           also to the pool -- which means that the pool might not clean
#           if it links to things outside of the pc directory -- so 
#           use carefully
#  7. Includes 'j' versions of file routines to allow dryrun
#  8. Added check to see if already in pool and if so returns 3

# For each file, check if the file exists in $bpc->{TopDir}/pool.
# If so, remove the file and make a hardlink to the file in
# the pool.  Otherwise, if the newFile flag is set, make a
# hardlink in the pool to the new file.
#
# Returns 0 if a link should be made to a new file (ie: when the file
#    is a new file but the newFile flag is 0).
#    JJK: of if newFlag =1 and $name has nlinks >1
# Returns 1 if a link to an existing file is made,
# Returns 2 if a link to a new file is made (only if $newFile is set)
# Return 3 if first finds that already linked to the appropriate pool chain
# Returns negative on error.

sub jMakeFileLink
{
        my($bpc, $name, $d, $newFile, $compress, $linkptr) = @_;

        my $poollink;
        my $result=GetPoolLinkMD5($bpc, $name, $d, $compress, 1, \$poollink);
        $$linkptr = $poollink if defined($linkptr) && $result > 0;

        if($result == 1){ #Already linked to the $d pool chain
                return 3;
        }elsif($result == 2) { #Matches existing, linkable pool file
                my $tempname = mktemp("$name.XXXXXXXXXXXXXXXX");
                return -5 unless jrename($name, $tempname); #Temorarily save
                if(!jlink($poollink, $name)) { #Link pool to source
                        jrename($tempname, $name); #Restore if can't link
                        return -3;
                }
                junlink($tempname); #Safe to remove the original
                return 1;
        }elsif($result == 3) { #No link or match
                if(defined($newFile) && 
                   ($newFile > 1 || ($newFile == 1 && (stat($name))[3] == 1 ))) 
{
                        $poollink =~ m|(.*)/|;
                        jmkpath($1, 0, 0777) unless -d $1 ;
                        return -4 unless jlink($name, $poollink);
                        return 2;
                } else { #New link should have been made but told not to
                        my $dog = (stat($name))[3];
                        print "$dog\n";
                        return 0;
                }
        }else {
                return -6; #Error from GetPoolLink call
        }
}

# GetPoolLink
# GetPoolLinkMD5
#Find the pool/cpool file corresponding to file $name.
#1. First iterates entire chain to see if *same inode* is present. I.e. if 
#   already linked to the chain. If so, it returns the first instance. 
#   Return = 1 and $Poolpathptr = name of the hard-linked match
#2. If $compareflg is set, then iterate through again this time looking for
#   file *content* matches (this is much slower). 
#   If so, it returns the first instance with Nlinks < HardLinkMax
#   Return = 2 and $Poolpathptr = name of the content match
#3. Finally, if not linked (and also not matched if $compareflg set)
#   Return=3 and $$poolpathptr = first empty chain
#Note: Return zero if zero size file
#      Return negative if error.
#Note: if chain has multiple copies of the file, then it returns the first 
linked
#match if present and if none and $compareflag set then the first content match
sub GetPoolLink
{
        my($bpc, $md5, $name, $compress, $compareflg, $poolpathptr) = @_;

        $compress = $bpc->{Conf}{CompressLevel} unless defined $compress;

        my ($md5sum , $ret) = defined($compress) && $compress > 0 ?
                zFile2MD5($bpc, $md5, $name, 0, $compress) : 
                $bpc->File2MD5($md5, $name);

        return 0 if $ret == 0; #Zero-sized file
        return -3 unless $ret >0;

        GetPoolLinkMD5($bpc, $name, $md5sum, $compress, $compareflg, 
$poolpathptr);
}

sub GetPoolLinkMD5
{
        my($bpc, $name, $md5sum, $compress, $compareflg, $poolpathptr) = @_;
        my($poolbase, $i);

        return -1 unless -f $name;
        my $inode = (stat(_))[1];  #Uses stat from -f
        return 0 if (stat(_))[7] == 0; #Zero-size (though shouldn't really 
happen since
                                   #md5sum input not defined for zero sized 
files

        $compress = $bpc->{Conf}{CompressLevel} unless defined $compress;

        return -2 unless 
                defined($poolbase = $bpc->MD52Path($md5sum, $compress));

        #1st iteration looking for matching inode
        $$poolpathptr = $poolbase;
        for($i=0; -f $$poolpathptr; $i++) { #Look for linked copy (inode match)
                return 1 if ((stat(_))[1] == $inode);
                $$poolpathptr = $poolbase . '_' . $i; #Iterate
        }

        return 3 unless $compareflg; #No inode match

        #Optional 2nd iteration looking for matching content
        my $compare = defined($compress) && $compress > 0 ? \&zcompare2 : 
\&jcompare;
        $$poolpathptr = $poolbase;
        for(my $j=0; $j<$i; $j++ ) { #Look for content match
                return 2 if (stat($$poolpathptr))[3] < 
$bpc->{Conf}{HardLinkMax} &&
                        !$compare->($name, $$poolpathptr);
                $$poolpathptr = $poolbase . '_' . $j; #Iterate
        }
        # No matching existing pool entry - $$poolpathptr is first empty chain 
element
        return 3;
}

#Test if $file is a valid link to the pool tree determined by $compresslvl.
#Returns the name of the pool file link if it is valid, 0 otherwise.
#Size is optional and is as described for zFile2MD5
#Note this is similar to GetPoolLink but faster though less powerful
sub is_poollink
{
        my ($bpc, $md5, $file, $size, $compresslvl)= @_;

        return 0 unless -f $file;

    $compresslvl = $Conf{CompressLevel} unless defined $compresslvl;
        my ($filemd5, $ret) = zFile2MD5($bpc, $md5, $file, $size, $compresslvl);
        return 0 unless $ret > 0;

        my $finode = (stat($file))[1];
        my $poolpath = $bpc->MD52Path($filemd5, $compresslvl);

        my %poolhash = map {(stat($_))[1] => $_} bsd_glob($poolpath . "_*");
        $poolhash{(stat(_))[1]} = $poolpath if -f $poolpath; #Add poolpath
        #Note we can't add $poolpath to the glob in case it doesn't exist
        #since glob will return a non-existing non-pattern match entry as-is

        foreach (keys %poolhash) {
                if($_ == $finode) {
                        $poolhash{$_} =~ m|.*/(.*)|;
                        return $1;
                }
        }
        return 0;
}


#Convert pool name to constant length string consisting
#of 32 hex digits for the base plus 6 (left) zero padded digits for
#the chain suffix (suffixes are incremented by 1 so that no suffix 
#records as 0). Note this accomodates chains up to 10^6 long.
#Use a number bigger than 6 if you have longer chains
#Useful if you want to order (e.g., sort) pool file names numerically
sub poolname2number
{
        $_[0] =~ m|(.*/)?([^_]*)(_(.*))?|;
        my $suffix = defined($4) ? $4+1 : 0;
        return sprintf("%s%06s", $2, $suffix)
}

# Renumber pool chain holes starting at $firsthole and continue up the chain 
# to fill up # to $maxgap holes (which need not be contiguous).
# If $maxgap is not specified it defaults to 1 (sufficient to cover one 
# deletion - i.e. hole -- which may be $file itself)
# If $firsthole exists, it is an error. Use delete_pool_file instead,
# if you want to delete first before renumbering.
# Return 1 on success; Negative on failure
sub renumber_pool_chain
{
        my ($firsthole, $maxholes) = @_;
        
        $maxholes = 1 unless defined $maxholes;
        
        my ($base, $sufx);
        if($firsthole =~ m|(.*_)([0-9]+)|) {
                $base = $1;
                $sufx = $2;
        }else{
                $base = $firsthole . "_";
                $sufx = -1; #Corresponds to no suffix
        }

        my $nextsufx = $sufx+1;
        my $nextfile = $base .  $nextsufx;
        while($nextsufx - $sufx <= $maxholes) {
                if(-e $nextfile) { #Non-hole so move/renumber
                        if(-e $firsthole || ! jrename($nextfile, $firsthole)) {
                                warn "Error: Couldn't renumber pool file: 
$nextfile --> $firsthole\n";
                                return -2;
                        }
#                       print "Renumbering: $nextfile --> $firsthole\n";
                        $firsthole = $base . (++$sufx); #Move hole up the chain
                }
                $nextfile = $base . (++$nextsufx);
        }
        return 1;
}

# Delete pool file (if it exists) and regardless renumber 
# chain to fill hole left by file. Fill up to $maxholes
# above $file (including the hole left by $file) where
# $maxholes defaults to 1.
# If $file doesn't exist, then it just fills the hole
# left by $file.
# Return 1 on success; Negative on failure
sub delete_pool_file
{
        my ($file, $maxholes) = @_;

        if(-e $file && !junlink($file)) { #Try to delete if exists
                warn "Error: Couldn't unlink pool file: $file\n";
                return -3;
        }
        return(renumber_pool_chain($file,$maxholes)); #Returns -1/-2 on fail
}

# Run BackupPC_nightly
sub run_nightly
{
        my ($bpc) = @_;
    my $err = $bpc->ServerConnect($Conf{ServerHost}, $Conf{ServerPort});
    if ($err) {
        printerr "BackupPC_nightly: can't connect to server ($err)...\n";
        return($err);
    }
    if ((my $reply = $bpc->ServerMesg("BackupPC_nightly run")) eq "ok\n" ) {
        $bpc->ServerMesg("log $0: called for BackupPC_nightly run...");
        print "BackupPC_nightly scheduled to run...\n";
        return 0;
    }
    else {
        printerr "BackupPC_nightly ($reply)...\n";
        return $reply;
    }
}


# Rewrite of compare function (in File::Compare) since it messes up on
# weird filenames with spaces and things (since it combines the "<"
# and the filename in a single string). Also streamline the code by
# getting rid of of extra fluff and removing code for comparing text
# files while we are at. The basic algorithm remains the classic one
# for comparing 2 raw files.
# Returns 0 if equal, 1 if not equal. And negative if error.
sub jcompare {
    my ($file1,$file2,$size) = @_;
        my ($fh1open, $fh2open, $fh1size);
        my $ret=0;

    local (*FH1, *FH2);
        unless (($fh1open = open(FH1, "<", $file1)) &&
                        ($fh2open = open(FH2, "<", $file2))) {
                $ret = -1;
                goto compare_return;
        }
        binmode FH1;
        binmode FH2;
        if (($fh1size = -s FH1) != (-s FH2)) {
                $ret=1;
                goto compare_return;
        }

        unless (defined($size) && $size > 0) {
            $size = $fh1size;
            $size = LINUX_BLOCK_SIZE if $size < LINUX_BLOCK_SIZE;
            $size = TOO_BIG if $size > TOO_BIG;
        }

        my $data1 = my $data2 = '';
        my ($r1,$r2);
        while(defined($r1 = read(FH1,$data1,$size)) && $r1 > 0) {
            unless (defined($r2 = read(FH2,$data2,$r1)) && $r2 == $r1
                                && $data2 eq $data1) {
                        $ret=1;
                        goto compare_return;
            }
        }
        $ret=1  if defined($r2 = read(FH2,$data2,LINUX_BLOCK_SIZE)) && $r2 > 0;
        $ret =-2 if $r1 < 0 || $r2 < 0; # Error on read

  compare_return:
        close(FH1) if $fh1open;
        close(FH2) if $fh2open;
        return $ret;
}

# Version of compare for BackupPC compressed files. Compares inflated
# (uncompressed) versions of files. The BackupPC::FileZIO subroutine
# is used to read in the files instead of just raw open & read.  Note
# this version of compare is relatively slow since you must use zlib
# to decompress the streams that are being compared. Also, there is no
# shortcut to just first compare the filesize since the size is not
# known until you finish reading the file.
sub zcompare {
        my ($file1, $file2, $compress)=@_;
        my ($fh1, $fh2);
        my $ret=0;

        $compress =1 unless defined $compress;
        unless ((defined ($fh1 = BackupPC::FileZIO->open($file1, 0, 
$compress))) &&
                        (defined ($fh2 = BackupPC::FileZIO->open($file2, 0, 
$compress)))) {
                $ret = -1;
                goto zcompare_return;
        }
        my $data1 = my $data2 = '';
        my ($r1, $r2);
        while ( ($r1 = $fh1->read(\$data1, 65536)) > 0 ) {
                unless ((($r2 = $fh2->read(\$data2, $r1)) == $r1)
                                && $data1 eq $data2) {
                        $ret=1;
                        goto zcompare_return;
                }
        }
        $ret =1 if ($r2 = $fh2->read(\$data2, LINUX_BLOCK_SIZE)) > 0; #see if 
anything left...
        $ret =-1 if $r1 < 0 || $r2 < 0; # Error on read

  zcompare_return:
        $fh1->close() if defined $fh1;
        $fh2->close() if defined $fh2;
        return $ret;
}

# Second alternative version that combines the raw speed of jcompare
# with the ability to compare compressed files in zcompare.  This
# routine effectively should compare two compressed files just as fast
# as the raw jcompare. The compare algorithm strips off the first byte
# header and the appended rsync checksum digest info and then does a
# raw compare on the intervening raw compressed data. Since no zlib
# inflation (decompression) is done it should be faster than the
# zcompare algorithm which requires inflation. Also since zlib
# compression is well-defined and lossless for any given compression
# level and block size, the inflated versions are identical if and
# only if the deflated (compressed) versions are identical, once the
# header (first byte) and trailers (rsync digest) are stripped
# off. Note that only the data between the header and trailers have
# this uniqe mapping. Indeed, since BackupPC only adds the checksum
# the second time a file is needed, it is possible that the compressed
# value will change with time (even though the underlying data is
# unchanged) due to differences in the envelope. Note, to some extent
# this approach assumes that the appended digest info is correct (at
# least to the extent of properly indicating the block count and hence
# compressed data size that will be compared)
sub zcompare2 {
    my ($file1,$file2,$size) = @_;
        my ($fh1, $fh2, $fh1open, $fh2open);
        my $Too_Big = 1024 * 1024 * 2;
        my $ret=0;
        
        unless (($fh1open = open($fh1, "<", $file1)) &&
                        ($fh2open = open($fh2, "<", $file2))) {
                $ret = -1;
                goto zcompare2_return;
        }
        binmode $fh1;
        binmode $fh2;
        
        my $fh1size = -s $fh1;
        my $fh2size = -s $fh2;

        my $data1 = my $data2 = '';
        unless (read($fh1, $data1, 1) == 1 &&  # Read first byte
                        read($fh2, $data2, 1) == 1) {
                $ret = -1;
                goto zcompare2_return;
        }
        if (vec($data1, 0, 8) == 0xd6 || vec($data1, 0, 8) == 0xd7) {
                return -2 unless ( defined(seek($fh1, -8, 2)) ); 
                return -3 unless ( read($fh1, $data1, 4) == 4 );
                $fh1size -= 20 * unpack("V", $data1) + 49;
                # [0xb3 separator byte] + 20 * NUM_BLOCKS + DIGEST_FILE(32) + 
DIGEST_INFO (16)
                # where each of NUM_BLOCKS is a 20 byte Block digest consisting 
of 4 bytes of
                # Adler32 and the full 16 byte (128bit) MD4 file digest 
(checksum).
                # DIGEST_FILE is 2 copies of the full 16 byte (128bit) MD4 
digest 
                # (one for protocol <=26 and the second for protocol >=27)
                # DIGEST_INFO is metadata consisting of a pack("VVVV") encoding 
of the
                # block size (should be 2048), the checksum seed, length of the
                # block digest (NUM_BLOCKS), and the magic number (0x5fe3c289).
                # Each is a 4 byte integer.
        }

        if (vec($data2, 0, 8) == 0xd6 || vec($data2, 0, 8) == 0xd7) {
                return -2 unless ( defined(seek($fh2, -8, 2)) ); 
                return -3 unless ( read($fh2, $data2, 4) == 4 );
                $fh2size -= 20 * unpack("V", $data2) + 49;
        }

        if ($fh1size != $fh2size) {
                $ret=1;
                goto zcompare2_return;
        }

        seek($fh1,1,0) ; #skip first byte
        seek($fh2,1,0) ; #skip first byte


        my $bytesleft=--$fh1size; #Reduce by one since skipping first byte
        $size = $fh1size unless defined($size) && $size > 0 && $size < $fh1size;
        $size = $Too_Big if $size > $Too_Big;

        my ($r1,$r2);
        while(defined($r1 = read($fh1,$data1,$size))
                  && $r1 > 0) {
            unless (defined($r2 = read($fh2,$data2,$r1))
                                && $data2 eq $data1) { #No need to test if 
$r1==$r2 since should be same size
                        $ret=1;        #Plus if it doesn't, it will be picked 
up in the $data2 eq $data1 line
                        goto zcompare2_return;
            }
                $bytesleft -=$r1;
                $size = $bytesleft if $bytesleft < $size;
        } 
        $ret=1  if defined($r2 = read($fh2,$data2,$size)) && $r2 > 0;
    #Note: The above line should rarely be executed since both files same size
        $ret =-2 if $r1 < 0 || $r2 < 0; # Error on read

  zcompare2_return:
        close($fh1) if $fh1open;
        close($fh2) if $fh2open;
        return $ret;
}

#Routine to remove files (unlink) or directories (rmtree)
# $fullfilename is full path name to file or directory
# Returns number of files deleted in the listref $dirfileref if defined;
sub delete_files
{
        my ($fullfilename, $dirfileref) = @_;
        my $listdeleted = defined($dirfileref);
        my $ret = -1;
        my $output;
        die "Error: '$fullfilename' doesn't exist or not writeable\n" 
                unless  -w $fullfilename;
        if(-f $fullfilename) {
                $ret = junlink($fullfilename);
        }
        elsif(-d $fullfilename) {
                $$dirfileref = "" if $listdeleted;
                open(local *STDOUT, '>', $dirfileref) #redirect standard out to 
capture output of rmtree
                        if $listdeleted;
                $ret = jrmtree($fullfilename, $listdeleted, 1);
        }
        return $ret;
}

#Touch command (modified version of Unix notion)
#First argument is the (full)filepath.
#Zero-length file is created if non-existent.
#If existent and second argument zero or  not defined then access 
# and modification times are updated.
#Returns 1 on success, -1 on error;
sub jtouch
{
        return 1 if $dryrun;
        if(! -e $_[0]) { #Create if non-existent
                return -1 unless defined(sysopen(my $fh, $_[0], O_CREAT));
                close $fh;
        }elsif(!$_[1]) {
                my $time = time();
                utime($time, $time, $_[0]);
        }
        return 1;
}

#Simple wrappers to protect filesystem when just doing dry runs
sub jchmod
{
        return 1 if $dryrun;
        chmod @_;
}

sub jchown
{
        return 1 if $dryrun;
        chown @_;
}

sub jcopy
{
        return 1 if $dryrun;
        copy @_;
}

sub jlink
{
        return 1 if $dryrun;
        link $_[0], $_[1];
}

sub junlink
{
        return 1 if $dryrun;
        unlink @_;
}

sub jmkdir
{
        return 1 if $dryrun;
        mkdir @_;
}

sub jmkpath
{
    return 1 if $dryrun;
    mkpath @_;
}

sub jmake_path
{
    return 1 if $dryrun;
    mkpath @_;
}

sub jrename
{
        return 1 if $dryrun;
        rename $_[0], $_[1];
}

sub jrmtree
{
        return 1 if $dryrun;
        rmtree @_;
}

sub jutime
{
        return 1 if $dryrun;
        utime @_;
}

1;


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
[BackupPC-users] Updated version of BackupPc_copyPcPool