Veritas-bu

[Veritas-bu] Checking to see if millions of files are backed up?

2007-03-26 17:35:38
Subject: [Veritas-bu] Checking to see if millions of files are backed up?
From: ddunham at taos.com (Darren Dunham)
Date: Mon, 26 Mar 2007 13:35:38 -0800 (PST)
> If one is to create a script to ensure that the files on the
> filesystem are backed upon before removing them, what is the best
> data-store model for doing so?
> 
> Obviously, if you have > 1,000,000 files in the catalog and you need
> to check each of those, you do not want to bplist -B -C -R 999999
> /path/to/file/1.txt for each file.  However, you do not want to grep
> "1" one_gigabyte_catalog.txt either as there is really too much
> overhead in either case.

A million is a lot, but with sufficiently large machines, you might be
able to fit all the names in memory (and if you're really lucky, a perl
hash).

With a lot of memory, I'd build a name hash from the expected files,
then run through bplist and verify that every file was in the hash.

When the memory needs of the hash cause this method to break down, you
can move to alternative databases.  There are several perl modules that
let you set up a quick database without installing MySQL or Postgres.
(but you could use those if you had them).   Then the comparison is
slower, but much less awful than running a million invocations of
bpflist just to check one file at a time.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >