BackupPC-users

[BackupPC-users] (Improved) Routine to delete individual files from selected backups...

2008-11-18 00:57:53
Subject: [BackupPC-users] (Improved) Routine to delete individual files from selected backups...
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: backuppc-users AT lists.sourceforge DOT net
Date: Tue, 18 Nov 2008 00:56:07 -0500
I have vastly improved and completely rewritten my program
BackupPC_deleteFiles.pl. Also many bugs were fixed ;)

The routine now allows you to delete arbitrary files and directories
(or list or globs thereof) across multiple hosts and shares, and
arbitrary (contiguous) backup ranges.

Specifically, you can now delete files from either a single backup or
from a range of backups. The program then appropriately deletes and/or
moves files and attributes and correspondingly adds or removes type=10 delete
attributes so as to make sure that the files show as fully deleted from the
backup range while not affecting the files visible from subsequent
backups that were not deleted.

The only thing it can't do (and refuses to do) is to delete files that
are hard links or directories that contain hard links since I couldn't
find any easy way to find and keep track of hard links.

The program provides lots of (optional) verbosity and debugging levels
so you can be sure you are deleting what you want to (and from a
debugging perspective that the appropriate visibility and inheritance
rules are being faithfully applied).


Since the program is now 1000+ lines long, I won't post it, but I will
be happy to email it to anyone interested or post it if there is
enough demand. Instead I will just copy over the logic so people can
check it if they are so inclined (note it took me multiple attempts
before I truly understood the topology of the backup chains and how to
efficiently and accurately encode it).

I will also include a copy of the usage message and options:
--------------------------------------------------------------------

usage: $0 [options] files/directories...
  NOTE: if -s option not set, then file/directory names include the share name

  Required options:
    -h host         Host (or - for all) from which path is offset
    -n backRange    Range of successive backup numbers to delete.
                    N   delete files from backup N (only)
                    M-N delete files from backups M-N (inclusive)
                    -M  delete files from all backups up to M (inclusive)
                    M-  delete files from all backups up from M (inlusive)
                    -   delete files from ALL backups

  Optional options:
    -s shareName    Share name (or - for all) from which path is offset
                    (don\'t include the 'f' mangle)
    -l              Just list backups by host (with level noted in parentheses)
    -r              Allow directories to be removed too
    -H              Skip over hard links (otherwise exits without deletions if 
hard links found)
    -m              Paths are unmangled (i.e. apply mangle to paths)
    -q              Don\'t show deletions
    -t              Trial run -- do everything but deletions
    -c              Clean up pool - schedule BackupPC_nightly to run (requires 
server running)
                    Only runs if files were deleted
    -d level        Turn on debug level

----------------------------------------------------------------------------

 Program logic is as follows:

 1. First construct a hash of hashes of 3 arrays and 2 hashes that
    encapsulates the structure of the full and incremental backups
    for each host. This hash is called:
    %backupsHoHA{<hostname>}{<key>} 
    where the keys are: "ante", "post", "baks", "level", "vislvl"
    with the first 3 keys having arrays as values and the final 2
    keys having hashes as values. This pre-step is done since this
    same structure can be re-used when deleting multiple files and
    dirs (with potential wilcards) across multiple shares, backups,
    and hosts. The component arrays and hashes are constructed as
    folows:
     
    - Start by constructing the simple hash %LevelH whose keys map
      backup numbers to incremental backup levels based on the
      information in the corresponding backupInfo file.

    - Then, for each host selected, determine the list (@Baks) of
      individual backups from which files are to be deleted based on
      bakRange and the actual existing backups.
  
    - Based on this list determine the list of direct antecedent
      backups (@Ante) that have strictly increasing backup levels
      starting with the previous level 0 backup. This list thus
      begins with the previous level zero backup and ends with the
      last backup before @Baks that has a lower incremental level.
      Note: this list may be empty if @Baks starts with a full (level
      0) backup. Note: there is at most one (and should in general be
      exactly one) incremental backup per level in this list starting
      with level 0.

    - Similarly, constuct the list of direct descendants (@Post) of
      the elements of @Baks that have strictly decreasing backup
      levels starting with the first incremental backup after @Baks
      and continuing until we reach a backup whose level is less than
      or equal to the level of the lowest incremental backup in @Baks
      (which may or may not be a level 0 backup). Again this list may
      be empty if the first backup after @Baks is lower than the
      level of all backups in @Baks. Also, again, there is at most
      one backup per level.

    - Note that by construction, @Ante is stored in ascending order
      and furthermore each backup number has a strictly ascending
      incremental level. Similarly, @Post is stored in strictly
      ascending order but its successive elements have monotonically
      non-increasing incremental levels. Also, the last element of
      @Ante has an incremental level lower than the first element of
      @Baks and the the last element of @Post has an incremental
      level higher than the lowest level of @Baks. This is all
      because anything else neither affects nor is affected by
      deletions in @Baks. In contrast, note that @Baks can have any
      any pattern of increasing, decreasing, or repeated incremental
      levels.
   
    - Finally, create the second hash (%VislvlH) which has keys equal
      to levels and values equal to the last backup with that level
      that could potentially be - visible from @Post (note we will
      use this to determine which files need to be copied to @Post
      from @Ante or @Baks after we delete the file entries in @Baks.

 2. Second, for each host, combine the share regex and list of files
    (and/or file shell regexs) with the backup ranges @Ante and @Baks
    to glob for all files that need either to be deleted from @Baks
    or blocked from view by setting a type=10 delete attribute type.
    If a directory is on the list and the remove directory flag (-r)
    is not set, then signal an error. If any of these files (or dirs)
    are hard links (either type hard link or a hard link "target")
    then signal an error (or if the -H flag is set, warn and skip
    them) since hard links cannot easily be deleted/copied/moved
    (since the other links will be affected). Duplicate entries and
    entries that are a subtree of another entry are rationalized and
    combined.

 3. Third, for each host and for each relevant file presence, start
    going successively through the @Ante, @Baks, and @Post chains to
    determine which files and attributes need to be deleted, cleared,
    or copied/linked to @Post.

    - Start by going through, @Ante, in ascending order to construct
      two visibility hashes. The first hash, %VisibleAnte, is used to
      mark whether or not a file may be visible from @Baks from a
      higher incremental level. The presence of a file set the value
      of the hash while intervening delete type=10 reset the value to
      invisible (-1). The second hash, %VisibleAnteBaks, (whose
      construction continues when we iterate through @Baks)
      determines whether or not a file from @Ante or @Baks was
      originally visible from @Post. And if a file was visible, then
      the backup number of that file is stored in the value of the
      hash.  Note that at each level, there is at *most* one backup
      from @Ante that is visible from @Baks and similarly there is at
      *most* one backup from @Ante and @Baks combined that is visible
      from @Post.

   - Next, go through @Baks to mark for deletion any instances of the
     file that are present. Then set the attrib type to type=10
     (delete) if %VisibleAnte indicates that a file from @Ante would
     otherwise be visible at that level. Otherwise, clear the attrib
     and mark it for deletion. Similarly, once the type=10 type has
     been set, all higher level element of @Baks can have their file
     attribs cleared whether they originally indicated a file type or
     a delete type.

   - Finally, go through the list of @Post in ascending order. If
     there is no file and no delete flag present, then use the
     information coded in %VisibleAnteBaks to determine whether we
     need to link/copy over a version of the file previously stored
     in @Ante and/or @Baks (along with the corresponding file attrib
     entry) or whether we need to set a type=10 delete
     attribute. Conversely, if originally, there was a type=10 delete
     attribute, then by construction of @Post, the delete type is no
     longer needed since the deletion will now occur in one of its
     antecedents in @Baks, so we need to clear the delete type from
     the attrib entry.

 4. Finally, after all the files for a given host have been marked
    for deletion, moving/copying or attribute changes, loop through
    and execute the changes. Files are either unlinked to delete or
    hard linked (or copied if zeros size) if we need to place a new
    copy in @Post. Attributes are either cleared (deleted) or set to
    type=10 delete or copied over to @Post. If all the files for a
    given attrib file are deleted, then we delete the attrib file too

 5. As a last step, optionally BackupPC_nightly is called to clean up
    the pool, provided you set the -c flag and that the BackupPC
    daemon is running. Note that this routine itself does NOT touch
    the pool.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>
  • [BackupPC-users] (Improved) Routine to delete individual files from selected backups..., Jeffrey J. Kosowsky <=