BackupPC-users

[BackupPC-users] Perl script for deleting files & directories from backups...

2008-11-04 14:31:02
Subject: [BackupPC-users] Perl script for deleting files & directories from backups...
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: General list for user discussion <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 04 Nov 2008 14:26:40 -0500
I have seen several people asking about how to delete files from their
backups.
I am contributing this *beta* perl script for input as a potential
solution.

It allows you to remove files and or directories cleanly from one or
more backups (including adjusting the attrib file entry properly - I
hope, see below).

Options include:
        - Specify: Host (or wildcard)
        - Specify:      Backup number (or wildcard)
    - Specify: Share (or wildcard/blank)
        - Allow removal of directories (like -r flag for rm)
        - Optionally use non-mangled path names
        - Force removal of entries in subsequent incrementals, even if
          type changes from file to directory or vica-versa (see below) 
        - Quiet (don't show deletions)
        - Trial run (print output but don't actually delete
          files/directories and don't write to attrib files
        - Debug mode (extra printing and for now implies trial run)
                
I have mostly been running this in debug mode but would like to get
input from others on both the approach (see below) and the code (see
below). Note I have only a casual knowledge of Perl so my code doesn't
have the beauty of a true monk. It also uses one external shell
command (find) since with Perl find, it is a minor PITA to implement
maxdepth (and I was lazy).

Also, I would love for people to try it (keep it on debug mode or use
it on scratch backups until you are comfortable that it doesn't have
"bad" side effects). [Note I have it hard-coded to dry run for
now. Just search for the line with "JJK" in it to comment out]

Here is the "logic" flow for the approach.
         - If file/directory exists in selected backup, then delete the
           file from this backup.

         - Next we need to figure out what to do with the attrib entry for
       the deleted file/directory. If it is a directory, then just
       delete the entry from the attrib file of the parent directory.

           If we deleted a file, then we need to look back at its
           antecedent backups to determine what to do.  If the file exists
           in a previous backup without an intervening "type=10" (delete)
           entry in the attrib database or without an intervening
           directory with the same name, then we need to change the
           current file's attrib entry to type=10 (deleted file) to signal
           this is the first time file should be treated as
           deleted. Otherwise just delete the file attribute entry from
           the attrib file

         - Finally, delete all versions of the file/directory in incremental
           backups that are its children unless the type changes from file
           to directory (or vica-versa). This is done primarily for
           caution to prevent unwittingly causing a single file removal to
           trigger a large directory delete (and it can be overriden).

           In this case, we are safe to just delete the attrib entry for
           any files/directories that we delete.

------------------------------------------------------------------------------------------

#!/usr/bin/perl

use strict;
use File::Path;

use lib "/usr/share/BackupPC/lib";
use BackupPC::Lib;
use BackupPC::Attrib qw(:all);
use BackupPC::FileZIO;
use Getopt::Std;

use Data::Dumper;  #Just used for debugging...

my $DEBUG=''; #Overrides debug option if set to non-empty ''

die("BackupPC::Lib->new failed\n") if ( !(my $bpc = BackupPC::Lib->new) );
my $TopDir = $bpc->TopDir();
(my $root = "$TopDir/pc") =~ s|//*|/|g;
my %Conf   = $bpc->Conf();
my $attrib = BackupPC::Attrib->new({ compress => $Conf{CompressLevel} });

my %opts;
if ( !getopts("h:n:s:rmfqtd:", \%opts) || @ARGV < 1 ) {
    print STDERR <<EOF;
usage: $0 [options] files/directories...
  Required options:
    -h host         Host (or - for all) from which path is offset
    -n backupNum    Full backup number (or - for all) from which path is offset
                    WARNING: this will also delete the file/directory from all
                    subsequent incrementals

  Optional options:
    -s shareName    Share name (or - for all) from which path is offset 
    -r              Allow directories to be removed too
    -m              Paths are unmangled (i.e. apply mangle to paths)
        -f              Force children to be removed even if changes from file 
to directory (or vica-versa)
    -q              Don\'t show deletions
        -t              Trial run -- do everything but deletions
        -d level        Turn on debug level (if undefined in code)
EOF
exit(1);
}

if ( $opts{h} !~ /^([\w\.\s-]+)$/
        || $opts{h} =~ m{(^|/)\.\.(/|$)} ) {
    print(STDERR "$0: bad host name '$opts{h}'\n");
    exit(1);
}
my $host = my $hostregex = $opts{h};

if ( $opts{n} !~ /^(-|\d+)$/ ) {
    print(STDERR "$0: bad dump number '$opts{n}'\n");
    exit(1);
}
my $num = my $numregex = $opts{n};
my $share = my $shareregex = $opts{s};
my $mangle = $opts{m};
my $force = $opts{f};
my $rmdir = $opts{r};
my $quiet = $opts{q};
my $dryrun = $opts{t};
($DEBUG eq '') && ($DEBUG=$opts{d});
#$DEBUG && ($dryrun=1);  #Uncomment if you want DEBUG to imply dry run
$dryrun=1; #JJK: Temporarily set to always dry-run (paranoia)

($hostregex eq '-' ) && ($hostregex = "[^/]+");
($numregex eq '-' ) && ($numregex = "[0-9]+");
if ($shareregex eq '-' || $shareregex eq '') {$shareregex = "f[^/]+";}
(my $findpath = "$hostregex/$numregex/$shareregex") =~ s|//*|/|g;
my @branches = `find  $root  -maxdepth 3 -regex \"$root/$findpath\"`;
chop(@branches);   #Note you could do this in native perl using preprocess but 
it is messy

my @Ante = my @Post =();
if ($num ne '-') {
        if ($host eq '-') {
                die "Error: wildcard for host requires wilcard for backupNum\n";
        }
        else {
                (-d "$root/$host/$num") || die "Error: Backup $num on host 
'$host' doesn't exist\n";
                        
                find_backup_heirarchy("$root/$host", $num, \@Ante, \@Post);
                #These are the backups with lower and higher incremental 
numbers (starting at 0).
                #@Ante is in descending order from $backupNum
                #@Post is in ascending order from $backupNum
$DEBUG > 1 && (print "ANTE[$num]: " . join(" ", @Ante) ."\n");
$DEBUG > 1 && (print "POST[$num]: " . join(" ", @Post) ."\n\n");
        }
}

my @filelist=();
my $warnflg=0;
while (@branches) {
        my @files = @ARGV;
        while (@files) { #Start to delete each file/directory (and adjust 
antecedants & descendants)
                my $delflag=0;
                $mangle && ($files[0] = $bpc->fileNameMangle($files[0]));
$DEBUG > 2 && $mangle && print "Mangle: $files[0]\n";
                (my $file ="$branches[0]/$files[0]") =~ m|(.+)/(.+)|;
                my $dir = $1;
                shift @files;
$DEBUG > 2 && print "LOOKING AT: $file\n";
                ( -d $dir ) || next; #Nothing to do since parent directory 
doesn't exist

                if (-d $file) {
                        unless ($rmdir) {
                                $warnflg || warn "Can't delete directories 
unless set -r flag\n";
                                $warnflg=1;
                                next;
                        }
                }
                else { # If not a dir, we first need look at earlier backups to 
see if a file (not dir) exists
                           #with same name without an intervening same name 
directory or deleted file attrib. 
                           #If so instead of deleting the attrib entry for the 
current file, we will set the type 
                           #to 10 (delete) and blank the other (no longer 
meaningful) entries.
                        my @ante =@Ante;
                        while(@ante) {  #Note not executed if we set the 
wildcard for numBackups since then @ante=()
                                (my $prevback = $file) =~ 
s|^($root/$hostregex)/([^/]*)/|$1/$ante[0]/|;
$DEBUG >1 && print "ANTE[$ante[0]] $prevback [" . (-f $prevback ? "F" : (-d 
$prevback ? "D":"-")) . "]\n";

                                if (-f $prevback ) {  #File exists in previous 
backup so better set delete attrib
                                        $delflag = 1;                           
        
                                        last;
                                }
                                (-d $prevback) && last;  #Directory exists with 
same name in prev backup so no need to adjust attrib 
                                ( get_attrib($file,"type") == 10) && last; 
#Already marked as deleted
                                shift @ante; # Continue looping at look at 
earlier antecdents...
                        }
                }

                # Next, delete and/or adjust delete attrib for current file we 
are looking at..
$DEBUG >1 && print "CURRENT[$num] $file [" . (-f $file ? "F" : (-d $file ? 
"D":"-")) . "][" . ($delflag ? "DELATRIB":"-") ."]\n";
                if (-r $file ) { #If file/dir exists, delete and then remove 
attrib or mark delete attrib
$DEBUG && print"   delete_file_and_attrib($file, $rmdir, $delflag, !$quiet)\n";
                        delete_file_and_attrib($file, $rmdir, $delflag, 
!$quiet);
                }
                elsif ($delflag ==1) { #If file doesn't exist, mark removal by 
setting delete attrib if $delflag
$DEBUG && print"   set_delete_attrib($file, !$quiet)\n";
                        set_delete_attrib($file, !$quiet);
                }

                #Finally, delete file/directory from later incremental backups 
(applies to both files & dirs)
                my @post = @Post;
                while (@post) { #Again note if wildcard set for numBackups then 
 @post=() so this is skipped
                                    #and files will be deleted through main loop
                        (my $nextback = $file)  =~ 
s|^($root/$hostregex)/([^/]*)/|$1/$post[0]/|;
$DEBUG >1 && print "POST[$post[0]]  $nextback [" . (-f $nextback ? "F" : (-d 
$nextback ? "D":"-")) . "]\n";
#$DEBUG >1 && print "POST:$post[0]:  $nextback [" . (-r $nextback ? "R":"-") . 
"]\n";
                        if (-r $nextback ) { #Exists in successor branch
                                unless ($force || (-f $file) == -f ($nextback)) 
{ 
                   #File subsequently replaced by directory or vica versa - be 
cautious
                                        warn "Warning: Not removing '$nextback' 
(and its children) since file type changes\n";
                                        last;
                                }
                                else {
$DEBUG && print"   delete_file_and_attrib($nextback, $rmdir, 0, !$quiet)\n";
                                        delete_file_and_attrib($nextback, 
$rmdir, 0, !$quiet);
                                }
                        }
                        shift @post; #Continue looping and looking at further 
descedants...
                }
        }
        shift @branches;
}
exit;

sub get_attrib
{
        my ($fullfilename, $attribname) = @_;
        $fullfilename =~ m{(.+)/(.+)};
        my $dir=$1;
        my $file=$2;
        (-f "$dir/attrib" ) || return 0;  #This is not necessarily an error 
because file may be empty
        ($attrib->read($dir, "attrib")) || die "Error: Cannot read attrib file 
for '$dir'\n";
        $attrib->{files}{$file}{type};
}

# Routine to delete file and adjust corresponding attribute file
# $fullfilename is full path name to file or directory
# $rmdirflg if 1, allows for removal of directories too
# $delttrib if 1 then changes attrib to deleted file type rather than
#   deleting attribute (only applies to files)
# $listdeletes if 1 then lists deleted files and/or modified attrib entries
sub delete_file_and_attrib
{
        my ($fullfilename, $rmdirflg, $delattrib, $listdeletes) = @_;
        $fullfilename  =~ m|(.*)/f(.*)|;
        my $isdir = (-d $fullfilename) ? 1 : 0;
        my $dir = $1;
        my $file = $2;
        
        ($isdir && $rmdirflg ne "1") && die "Error: can't delete directories 
unless rmdirflg set\n";
        ( -w $fullfilename) || die "Error: '$fullfilename' doesn't exist or not 
writeable\n";
        ( -w "$dir/attrib") || die "Error: Cannot write to attrib file for 
'$dir'\n";
        ($attrib->read($dir, "attrib")) || die "Error: Cannot read attrib file 
for '$dir'\n";
         (defined ($attrib->get($file))) || die "Error: '$file' not found in 
attrib file\n";

        if (remove($fullfilename, $listdeletes) > 0 ) { # Remove file/directory
                if ($delattrib == 1 && !$isdir) {
                        $attrib->set($file, {  # Set file to deleted type
                                type  => 10,
                                mode  => 0,
                                uid   => 0,
                                gid   => 0,
                                size  => 0,
                                mtime => 0,
                                                 });
$DEBUG >1 && print "Setting type=10 (deleted file) attrib for $fullfilename\n";
                }
                else { # Remove attrib entry for file
                        $attrib->delete($file);
$DEBUG >1 && print "Removing attrib entry for: f$file\n";
                }
                $dryrun || ($attrib->write($dir, "attrib")) || die "Error 
writing to attribute file for $dir\n";
                return 0;
        }
        return 1;
}

sub remove 
{
        $_ = $_[0];
        my $ret=0;
        if (-f ) {
                ($dryrun ? $ret = 1 : ($ret = unlink));
                ($ret == 1) && $_[1] && print "unlink $_\n";
        }
        elsif (-d ) {
                if($dryrun) {
                        print "deleting directory tree starting at: $_ \n";
                        $ret=1;
                }
                else {
                        $ret = rmtree($_, $_[1], 1);
                }
        }
        return $ret;
}

sub set_delete_attrib
{
        my ($fullfilename, $listdeletes) = @_;
        $fullfilename  =~ m|(.*)/f(.*)|;
        my $dir = $1;
        my $file = $2;
        
        if (-e "$dir/attrib") {
                ( -w "$dir/attrib") || die "Error: Cannot write to attrib file 
for '$dir'\n";
                ($attrib->read($dir, "attrib")) || die "Error: Cannot read 
attrib file for '$dir'\n";
        }
        else { #Possible it doesn't exist if directory is empty (which occurs 
frequently with incrementals)
                $attrib = BackupPC::Attrib->new({ compress => 
$Conf{CompressLevel} });
        }
        $attrib->set($file, {  # Set file to deleted type
                type  => 10,
                mode  => 0,
                uid   => 0,
                gid   => 0,
                size  => 0,
                mtime => 0,
                                 });
$DEBUG > 3 && print Dumper($attrib->get($file));
        $dryrun || ($attrib->write($dir, "attrib")) || die "Error writing to 
attribute file for $dir\n";
        $listdeletes && print "Setting type=10 (deleted file) attrib for 
$fullfilename\n";
        return 0;
}

sub find_backup_heirarchy
{
        my ($path, $baknum, $antecedants, $descendants) = @_;
        my %backupInfo;

        $path =~ s|//*|/|g;
        my @baks = `find $path -maxdepth 1  -regex \"$path/[0-9]+\" -printf 
"%f\n"`;
        chop(@baks);
        my @before = my @after = ();
        while (@baks) {
                if ($baks[0] <= $baknum) {
                        push (@before, $baks[0]);
                }
                else {
                        push (@after, $baks[0]);
                }
                shift @baks;
        }
        @before = reverse(sort numerically @before);
        shift(@before); # remove $baknum from beginning;
        @after = sort numerically @after;
        

        my $oldlevel = my $baklevel = get_baklevel("$path/$baknum");
        while(@before) {
                my $newlevel = get_baklevel("$path/$before[0]");
                if ($newlevel < $oldlevel) {
                        push(@$antecedants, $before[0]);
                        ($newlevel == 0 ) && last;
                        $oldlevel = $newlevel;
                }
                shift @before;
        }

        while(@after) {
                my $newlevel = get_baklevel("$path/$after[0]");
                ($newlevel == $baklevel ) && last;
                if ($newlevel > $baklevel) {
                        push(@$descendants, $after[0]);
                }
                shift @after;
        }
}

sub numerically {$a <=> $b}

sub get_baklevel {
        our %backupInfo = ();
        my $bakinfo = "$_[0]/backupInfo";

        (-f $bakinfo ) || warn "Can't read $bakinfo\n";

        unless (my $ret = do $bakinfo) {
                warn "couldn't parse $bakinfo: $@" if $@;
                warn "couldn't do $bakinfo: $!" unless defined $ret;
                print "couldn't run $bakinfo" unless $ret;
        }
        if ( !keys(%backupInfo) || !defined($backupInfo{num}) ) {
                warn "$bakinfo is empty\n";
        }
        return $backupInfo{level};
}

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>