ADSM-L

Re: Strategies for DR recovery of large clients

2002-09-11 11:33:21
Subject: Re: Strategies for DR recovery of large clients
From: "Kauffman, Tom" <KauffmanT AT NIBCO DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 11 Sep 2002 09:54:23 -0500
Werner ---

>
> The second option is to do full system ARCHIVES, but this
> would cause activity on both the NSM and the client, neither
> of which have available windows for this activity.
>
> Because either of these possibilities would, of necessity, be
> occasional (at best once a week), there is the additional
> issue of how easy it is to bring the system up to the most
> current backup after the restore. Would a multi-filespace
> simple restore be intelligent enough to pass only the last 7
> days of tape or would it pass all tapes with those filespaces on them?

My experience is that a point-in-time restore will use what already exists
as a starting point. We have several large AIX filesystems with a great deal
of daily activity (delete/create files). On our first D/R test the PIT
restore took six hours. As a result, I now run a weekly archive of the
directory. At the D/R site we retrieve the most recent archive (about 30
minutes) and then do the PIT restore (about 15 minutes).
>
> A third possibility I have thought of recently is to isolate
> these very large servers in their own COPY POOLs, effectively
> co-locating only these servers, but I am not convinced this
> would reduce the number of tapes passed by the DR restores,
> and it would certainly increase the total number of tapes in
> the DR set and increase off-site reclamation activity, which
> already takes the better part of the day shift most days.

We do this as well -- at the very least, split archive pools from backup
pools. Then don't bother with reclaims on the archive pools. Just let them
expire. This will require more tape, but tape is cheap in the great scheme
of things. In our case, most archive data has a 23-day retention so we need
23 X <number of daily tapes> plus a few for contingencies. In actual fact,
we have 4 off-site archive copy pools: one for MS-Exchange; one for the SAP
datbase; one for the SAP redo logs; and one for all other Oracle database
archives (and the second copy of the SAP redo logs). These all exist to
speed up recovery during D/R.

At one point, when using DLT-7000 drives, we had 445 tapes in the off-site
storage that we had to take to the hot-site for our D/R testing. Thanks to
the move to LTO and a bit of creative combinations (we used to have a
seperate copy pool for the second set of SAP redo logs, for example) we now
have about 115 LTO tapes off-site on any given day.

We can recover adsm, restore a 650-plus GB SAP (Oracle) database, restore
three additional Oracle databases (25 GB total), recover two MS-Exchange
servers, and start recovering other TSM storage pools, all in just under 24
hours. Our TSM server is also the SAP DB server, so it's an S7A -- but we
use 5 3581 LTO autochangers, configued as stand-alone drives. Our SLA
doesn't require our big NT server to be up until day 3, so I recover the NT
backup tapes from the consolidated off-site backup copypool first.

HTH --

Tom Kauffman
NIBCO, Inc