ADSM-L

Re: Long term data retention for retired clients

2005-07-14 13:58:29
Subject: Re: Long term data retention for retired clients
From: "Prather, Wanda" <Wanda.Prather AT JHUAPL DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 14 Jul 2005 13:58:18 -0400
Biggest problem I have with using EXPORT or BACKUPSET, is that people
most likely are going to ask for a partial restore.  And they probably
aren't going to remember exactly what the directory structure &
filenames were.  Two years from now, NO ONE will have any idea what was
really on that EXPORT tape.  And you can't hunt for it effectively
unless all the data is still in the DB.

So what I've done as a compromise is use SQL SELECT to pull a list of
all the file names/backup dates for a retired client from the TSM DB
into a flat file, then do the EXPORT and delete the filespaces.  The
flat file remains around and can be searched using ordinary tools to
figure out what is on the EXPORT tape.

Wanda Prather
"I/O, I/O, It's all about I/O"  -(me)



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Allen S. Rout
Sent: Thursday, July 14, 2005 1:49 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Long term data retention for retired clients


==> On Thu, 14 Jul 2005 10:27:49 +0100, John Naylor
<John.Naylor AT SCOTTISH-SOUTHERN.CO DOT UK> said:

> I have consideredf various approaches
> 1) Export
> 2) Backup set
> 3) Create a new domain for retired clients which have the long term
> retention requirement

> I see export and backup sets as reducing database overhead, but being
less
> easy to track and rather unfriendly if you just need to get a subset
of
> the data back

Export yes, backupset no; [ see below ]


> The new domain would retain great visibilty for the data and allow
easy
> access to subsets of the data, but you would stiil have the database
> overhead.

Yes, but in the long term this overhead devolves to just space.  For
example,
if everything in the node is ACTIVE, I don't believe that it represents
much
of an e.g. expiration hit.  So there's an incrementally (hah) larger
full DB
backup, and more space on disk for the DB, but not a lot of day to day
DB
overhead.

> Does the choice just depend, on how often in reality you will need to
get
> the  data back ?

I'd say that and how frequently you want to do the retention thing, for
how
long.

Off the top of my head, if keeping the bits and pieces around would add
up to,
say, a third of my [DB space / data / whatever] I'd consider doing
something
nearline with it.


[below]

Keep in mind:  you can restore from a backupset stored on the server;
this
need only be a little less convenient than restorations from online
data.
Gedankenexperiment:

Say you want to deal with TheNode:

rename node TheNode OLD-2005-07-12-13-TheNode

       [ in case you ever want to use TheNode name again ]

GEN BACKUPSET OLD-2005-07-12-13-TheNode Terminal devclass=foobar
RET=NOLimit

which usess tapes FOO1 and FOO2.

Then you DEL FILESPACE OLD-2005-07-12-13-TheNode *

and

CHECKOUT LIBVOLUME FOOLIB FOO1
CHECKOUT LIBVOLUME FOOLIB FOO2

At this point, you've got a -permanent- record of the state of TheNode,
at the
cost of a few records in the node and backupset tables. Of course, you
have an
increased exposure to media failure: no copypools for backupsets.
Anyway, to
restore from it all you have to do to use it is check the tapes back in,
and
issue a

dsmc restore backupset Terminal -loc=server [sourcespec] [destspec]

This is going to be a much less efficient restore than the online one,
but
only in wall clock time and tape use, not in human skull sweat.


Plus, if someone gets crotchety about the archive, you can hand them the
checked out tapes and tell them to get their own LTO3. (heh)


- Allen S. Rout