Re: Strategies for DR recovery of large clients: More Ideas

Some more ideas:

We put the TSM server in its own storage pool.  We do a force move data of
all tapes in the offsite each week, that keeps it essentially on one tape
and the process is quite fast.  I have a perl script that figures all this
out and does the move data reconstruct commands.  It is relatively easy to
do this.  The other piece is we use mksysb to restore our TSM server not the
TSM server itself.  That makes the recovery of the TSM server quick for one
that has a 100GB database.  We do the mksysb restore which has all the
scripts on it for the rest of the DR restore (all clients and the server).
Restore the database using the TSM Server DSMSERVER RESTOREDB and we are
set.  The TSM server backup of itself is used only if there is a file that
we need a copy of that is not on the mksysb.  I have nearly automated the
mksysb process to use a TSM storage pool to manage the tapes that has a
bunch of empty private tapes assigned to it.  The perl script to do the
management is in testing.  It will support doing the mksysb and managing the
checkout and checkin automated by just kicking off the script.

Some things we are toying with.  We are thinking if we create a balanced set
of storage pools on tape and do a restore storage pool to disk, then restore
the clients, that this is a good way to go.  We have not tested it yet.  The
issue is that active and inactive data are restored, but if you plan your
storage pools by priority and have enough disk at the DR site to restore the
storage pool for the largest one, then you can just delete the storage pool
when you have finished that particular restore (may require a dummy
migration).  We had considered asking for the restore storage pool to only
restore active files, but the problem is it is impossible to accomplish with
the way aggregates are managed and have any kind of performance.

So our proposed method is:
Restore the storage pool to the primary pool on disk.
Fire up the clients to restore their root/c: drives (many at once).
Then, use tape to restore the databases (they are in different storage
pools).

This may seem like a hokie way to accomplish something but I am from
Virginia and the Hokies are our team.  So we think we can win with this.

Anyone else have some ideas?

Paul D. Seay, Jr.
Technical Specialist
Naptheon Inc.
757-688-8180


-----Original Message-----
From: Robin Sharpe [mailto:Robin_Sharpe AT BERLEX DOT COM]
Sent: Tuesday, September 10, 2002 12:03 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Strategies for DR recovery of large clients


Werner,

I feel your pain...  ;)

You have hit most of the major issues of disaster recovery with TSM squarely
on the head.  We have had similar experience in our testing... 4-6 hours to
get the TSM server up, running through loads of tapes (even though storage
pool is collocated with only three servers), 48-hour window, etc.

We have proved that we can get our three critical clients back within 24
hours, but they are not nearly as big as yours.  We use DLT8000 drives.

Probably the best way for you to get better restore throughput is to add
more drives and do concurrent restores.  TSM should only mount the tapes
that actually contain the file versions you will restore.  The problem is
that, even with collocation, after many months of backups on a relatively
active system, these files will get scattered across many tapes.
Conventional wisdom suggests using collocation by filespace to reduce this
effect... and also guarantee that concurrent restores of different file
systems will not compete for the same tape volume.  But the cost is of
course using a lot more tape.  Another approach might be to occasionally
(every three months maybe) do a "full" backup (by changing mode to
"absolute" to force even unchanged files to get backed up)... this should
effectively "defragment" the tape pool and put all active versions on one
(or a couple) tape.  We did this once with an additional machine that we
DR'ed and it worked quite well.  Some people don't like this concept because
it defeats TSM's "progressive" backup methodology, but I think its an
acceptable compromise.

As you said, backup sets are not a good option for DR... for one thing,
creating the backupset will take as long as restoring the whole system, and
will read the same number of tapes.  You will suffer this on a regular
schedule since you'll have to make new backupsets probably every week or
two.  Secondly, restoring from backupsets effectively single-threads that
client because all of it's data is on one or maybe two tapes.

Good luck, and please keep us posted on your results!

Robin Sharpe
Berlex Labs