[Veritas-bu] Backups vs archives

On 12/7/06, Curtis Preston <cpreston at glasshouse.com> wrote:
>
> Based on some previous posts, I'd like to throw out the following
> thought, and see what you folks think about it.

Thanks for posting to the netbackup list, this is a great article. My
thoughts on this are the following.

Netbackup's archive feature

I've never cared for netbackup's archive feature because it mechanizes
removal of data. Removal of data should only be done manually and
with great care. Bugs like this one (
http://entsupport.symantec.com/docs/284450 ) reinforce my dislike of
systems that perform automated removal. But I agree that the process
of doing backups should not be the same as performing archives. They
are two different things.

Critical metadata required for recovery

For both backups and archives, the single most vital piece of metadata
to me is automount history. Data inevitably gets moved from one
storage server to another or from one volume to another. Users don't
know or care where it lives, all they know is that they need
/proj/roswell/autopsy recovered from 1/1/1999. Backup/archive
systems probably only know about it's current true location,
blackhole:/vol/vol0/autopsy. Everyone should keep a running history
of all automount map changes. (I can share the scripts I've got, but
they're site-specific).

Poor man's archive server

I work at a small company. Here's what we do. Clearly this is not a
great system but it has some advantages. A request comes in to
archive a home directory or engineering related project files. We
copy the requested data to inexpensive big disk raid5 systems. These
systems get backed up periodically such that there's never more than
two copies on tape and they have little overlap. We periodically run
updatedb so that a single file can be found with locate. When copying
we keep the entire path intact. Other metadata retained: md5sum of
each file, original server name, volume, archive date. This has none
of the other great advantages that a CAS or HSM system might provide,
and filename search doesn't scale well for multiple archive servers.
A search layer can be added above locate to use each archive server's
locate database. This is on my to-do list. ;-) In many cases a
request comes in to recover monster-spreadsheet.xls but they don't
always remember where it used to live. locate is extremely useful in
cases like this.

Final Thoughts

Modern CAS systems look very appealing. A CAS system would be too
expensive for all our general purpose data but I'm going to seriously
consider one for at least financial data. For small data sets a
combination of rdiff-backup and rsync for replication might just be
enough to do the trick. In any case today's commercial CAS systems
look great. I'd be interested in hearing about experiences with them
from anyone on the list. Email archiving is another thing I really
need to look into using at our site.

Cheers.

[snip]

--
-Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mailman.eng.auburn.edu/pipermail/veritas-bu/attachments/20061208/fcdaa5b0/attachment-0001.html