I have vastly improved and completely rewritten my program
BackupPC_deleteFiles.pl. Also many bugs were fixed ;)
The routine now allows you to delete arbitrary files and directories
(or list or globs thereof) across multiple hosts and shares, and
arbitrary (contiguous) backup ranges.
Specifically, you can now delete files from either a single backup or
from a range of backups. The program then appropriately deletes and/or
moves files and attributes and correspondingly adds or removes type=10 delete
attributes so as to make sure that the files show as fully deleted from the
backup range while not affecting the files visible from subsequent
backups that were not deleted.
The only thing it can't do (and refuses to do) is to delete files that
are hard links or directories that contain hard links since I couldn't
find any easy way to find and keep track of hard links.
The program provides lots of (optional) verbosity and debugging levels
so you can be sure you are deleting what you want to (and from a
debugging perspective that the appropriate visibility and inheritance
rules are being faithfully applied).
Since the program is now 1000+ lines long, I won't post it, but I will
be happy to email it to anyone interested or post it if there is
enough demand. Instead I will just copy over the logic so people can
check it if they are so inclined (note it took me multiple attempts
before I truly understood the topology of the backup chains and how to
efficiently and accurately encode it).
I will also include a copy of the usage message and options:
--------------------------------------------------------------------
usage: $0 [options] files/directories...
NOTE: if -s option not set, then file/directory names include the share name
Required options:
-h host Host (or - for all) from which path is offset
-n backRange Range of successive backup numbers to delete.
N delete files from backup N (only)
M-N delete files from backups M-N (inclusive)
-M delete files from all backups up to M (inclusive)
M- delete files from all backups up from M (inlusive)
- delete files from ALL backups
Optional options:
-s shareName Share name (or - for all) from which path is offset
(don\'t include the 'f' mangle)
-l Just list backups by host (with level noted in parentheses)
-r Allow directories to be removed too
-H Skip over hard links (otherwise exits without deletions if
hard links found)
-m Paths are unmangled (i.e. apply mangle to paths)
-q Don\'t show deletions
-t Trial run -- do everything but deletions
-c Clean up pool - schedule BackupPC_nightly to run (requires
server running)
Only runs if files were deleted
-d level Turn on debug level
----------------------------------------------------------------------------
Program logic is as follows:
1. First construct a hash of hashes of 3 arrays and 2 hashes that
encapsulates the structure of the full and incremental backups
for each host. This hash is called:
%backupsHoHA{<hostname>}{<key>}
where the keys are: "ante", "post", "baks", "level", "vislvl"
with the first 3 keys having arrays as values and the final 2
keys having hashes as values. This pre-step is done since this
same structure can be re-used when deleting multiple files and
dirs (with potential wilcards) across multiple shares, backups,
and hosts. The component arrays and hashes are constructed as
folows:
- Start by constructing the simple hash %LevelH whose keys map
backup numbers to incremental backup levels based on the
information in the corresponding backupInfo file.
- Then, for each host selected, determine the list (@Baks) of
individual backups from which files are to be deleted based on
bakRange and the actual existing backups.
- Based on this list determine the list of direct antecedent
backups (@Ante) that have strictly increasing backup levels
starting with the previous level 0 backup. This list thus
begins with the previous level zero backup and ends with the
last backup before @Baks that has a lower incremental level.
Note: this list may be empty if @Baks starts with a full (level
0) backup. Note: there is at most one (and should in general be
exactly one) incremental backup per level in this list starting
with level 0.
- Similarly, constuct the list of direct descendants (@Post) of
the elements of @Baks that have strictly decreasing backup
levels starting with the first incremental backup after @Baks
and continuing until we reach a backup whose level is less than
or equal to the level of the lowest incremental backup in @Baks
(which may or may not be a level 0 backup). Again this list may
be empty if the first backup after @Baks is lower than the
level of all backups in @Baks. Also, again, there is at most
one backup per level.
- Note that by construction, @Ante is stored in ascending order
and furthermore each backup number has a strictly ascending
incremental level. Similarly, @Post is stored in strictly
ascending order but its successive elements have monotonically
non-increasing incremental levels. Also, the last element of
@Ante has an incremental level lower than the first element of
@Baks and the the last element of @Post has an incremental
level higher than the lowest level of @Baks. This is all
because anything else neither affects nor is affected by
deletions in @Baks. In contrast, note that @Baks can have any
any pattern of increasing, decreasing, or repeated incremental
levels.
- Finally, create the second hash (%VislvlH) which has keys equal
to levels and values equal to the last backup with that level
that could potentially be - visible from @Post (note we will
use this to determine which files need to be copied to @Post
from @Ante or @Baks after we delete the file entries in @Baks.
2. Second, for each host, combine the share regex and list of files
(and/or file shell regexs) with the backup ranges @Ante and @Baks
to glob for all files that need either to be deleted from @Baks
or blocked from view by setting a type=10 delete attribute type.
If a directory is on the list and the remove directory flag (-r)
is not set, then signal an error. If any of these files (or dirs)
are hard links (either type hard link or a hard link "target")
then signal an error (or if the -H flag is set, warn and skip
them) since hard links cannot easily be deleted/copied/moved
(since the other links will be affected). Duplicate entries and
entries that are a subtree of another entry are rationalized and
combined.
3. Third, for each host and for each relevant file presence, start
going successively through the @Ante, @Baks, and @Post chains to
determine which files and attributes need to be deleted, cleared,
or copied/linked to @Post.
- Start by going through, @Ante, in ascending order to construct
two visibility hashes. The first hash, %VisibleAnte, is used to
mark whether or not a file may be visible from @Baks from a
higher incremental level. The presence of a file set the value
of the hash while intervening delete type=10 reset the value to
invisible (-1). The second hash, %VisibleAnteBaks, (whose
construction continues when we iterate through @Baks)
determines whether or not a file from @Ante or @Baks was
originally visible from @Post. And if a file was visible, then
the backup number of that file is stored in the value of the
hash. Note that at each level, there is at *most* one backup
from @Ante that is visible from @Baks and similarly there is at
*most* one backup from @Ante and @Baks combined that is visible
from @Post.
- Next, go through @Baks to mark for deletion any instances of the
file that are present. Then set the attrib type to type=10
(delete) if %VisibleAnte indicates that a file from @Ante would
otherwise be visible at that level. Otherwise, clear the attrib
and mark it for deletion. Similarly, once the type=10 type has
been set, all higher level element of @Baks can have their file
attribs cleared whether they originally indicated a file type or
a delete type.
- Finally, go through the list of @Post in ascending order. If
there is no file and no delete flag present, then use the
information coded in %VisibleAnteBaks to determine whether we
need to link/copy over a version of the file previously stored
in @Ante and/or @Baks (along with the corresponding file attrib
entry) or whether we need to set a type=10 delete
attribute. Conversely, if originally, there was a type=10 delete
attribute, then by construction of @Post, the delete type is no
longer needed since the deletion will now occur in one of its
antecedents in @Baks, so we need to clear the delete type from
the attrib entry.
4. Finally, after all the files for a given host have been marked
for deletion, moving/copying or attribute changes, loop through
and execute the changes. Files are either unlinked to delete or
hard linked (or copied if zeros size) if we need to place a new
copy in @Post. Attributes are either cleared (deleted) or set to
type=10 delete or copied over to @Post. If all the files for a
given attrib file are deleted, then we delete the attrib file too
5. As a last step, optionally BackupPC_nightly is called to clean up
the pool, provided you set the -c flag and that the BackupPC
daemon is running. Note that this routine itself does NOT touch
the pool.
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|