Re: Script to show which files are NOT backed up? (gnutar excludes/inclu

On Thu, Oct 14, 2004 at 09:27:17AM -0700, Paul Schmidt wrote:
> Hello,
> 
> Since I have a large filesystem that is larger than my 40GB tapes, I
> use the gnutar exclude lists features to back this up.  Since the
> method is somewhat error-prone to forgetting things/excluding too
> much, I was wondering if anyone had a script to show me all of the
> files on my filesystem that are NOT covered with my disklist file
> entries.
> 
> The large data I am backing up is on a partition that also has smaller
> data directories on it too.  The large data is all under a common
> path, and is broken down into subdirectories which themselves DO fit
> on a tape, such as:
> 
> /somepath/
>         smalldir/
>         othersmalldir/
>         largedir/
>                 A/
>                      bunch of files that all fit on a tape
>                 B/
>                      bunch of files that all fit on a tape
>                 ...
>         anothersmalldir/
> 
> 
> The way I have this set up is /somepath/ has a disklist entry, with an
> exclude file that specifies ./largedir/*   This gathers all of the
> small data dirs all at once.
> 
> Then, each directory under largedir has its own disklist entry, such
> as /somepath/largedir/A for example.  This is the error prone part.
> 
> The A, B, etc. directories don't change TOO often, but they're not
> static. I am looking for a tool to make it easier to verify that all
> the necessary disklist entries have been made and that none of the
> important data (anywhere on my server) has been accidentally left out.
> 
> Any suggestions for how I can do my configuration better that might
> prevent some of these issues would be appreciated as well.
> 

YMMV, but I think this would work; I'm assuming you are indexing.

Compare the files listed in the index(es) with a find on the
parent (highest level) DLE directory.  For example, suppose
you are trying to backup /somepath and all things under it
in several DLEs here is a plan for a shell script.

1) Create a temporary file of the entire tree

    cd /somepath
    find . -xdev > /tmp/some_current    # use -xdev to not cross FS

2) Create a temporary file of all the files in the indexes

    cd <your index directory>
    # note the names of all the most recent level 0's of each
    # dle of interest, probably _somepath*/*_0.gz.
    # this could be automated - I think - with something like

        for dir in _somepath*
        do
                ls $dir/*_0.gz | tail -1

    # uncompress and combine each of the above into a single
    # temporary file
    cp /dev/null /tmp/some_index        # create or empty the file

    # continuing the automation from above

        done |
        while read idxfile
        do
                gzip -d $idxfile >> /tmp/some_index
        done

3) Some editing, possibly with sed, will be needed to make the
   find output and the index data match.  - I think - this might
   work (changing the last 2 lines of automation).

                gzip -d $idxfile
        done |
        sed -e 's/^/./' -e 's,/$,/,' > /tmp/some_index

4) sort the two temporary files

        sort -o /tmp/some_current /tmp/some_current
        sort -o /tmp/some_index   /tmp/some_index

5) use comm to determine what is missing or added

        # files in both lists
        comm -12 /tmp/some_current /tmp/some_index

        # files in find output only
        comm -23 /tmp/some_current /tmp/some_index

        # files in indexes only
        comm -13 /tmp/some_current /tmp/some_index


HTH
jon
-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)
Re: Script to show which files are NOT backed up? (gnutar excludes/includes)