ADSM-L

Re: [ADSM-L] Insight into improving restores needed

2007-09-05 16:34:06
Subject: Re: [ADSM-L] Insight into improving restores needed
From: Ben Bullock <bbullock AT MICRON DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 5 Sep 2007 14:31:59 -0600
LOL, you crack me up Kelly. All very good advice. 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Kelly Lipp
Sent: Wednesday, September 05, 2007 2:28 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Insight into improving restores needed

Your data characteristic is the most important factor.

If the bulk of your data to restore is database data that is being
backed up fully everyday, it's probably pretty easy.  The data is very
sequential on tape (as you get a new DR copy each day).  You may have
some contention for a particular volume (or set of volumes) depending on
how the data was written.  But since you have LTO1 and 2, I'm guessing
you use multiple streams to write your DR tapes each day so stuff from
clients should be reasonably collocated on tape. 

Do database data is pretty easy.

Where you get in a bind is on the Wintel side (probably) as there are
file servers, DNS servers, etc., to restore.  Backup data from them is
probably stored on bunches of tapes.  For small things, like the DNS,
this probably won't hurt too bad either.  But on large file servers you
will die.  Two reasons: huge numbers of objects and Windows will only
create files so fast and these files are going to be spread over a large
number of tapes.  There was a thread recently about what to do about
restoring file servers.  The essence is to use image backups
periodically with incrementals daily.  The image will be written
sequentially to a small number (relatively) of DR tapes.  The
incrementals will be scattered, but as long as there isn't a very large
gap between the images, you can minimize the number of tape mounts.  And
in a 3584 with enough tape drives, this might not be an issue.

How many copy storage pools do you have?  Do you have some notion of
criticality in these pools (if you have more than one)?  One can
probably assume that critical data is much less than total data.  To the
extent you can isolate this you can reduce the number of tapes you need
to touch to restore.  This is important if the library at the DR site
isn't large enough to hold all of your DR tapes.  This is likely!  And
it will never fail: the tape the restore needs is the tape in your hand,
not in the library.  You can help yourself a bit by doing some queries
ahead of the restore to determine which tapes will be required.  But
still.  For small sites contemplating this same scenario, I like
individual drives rather than libraries.  If you can't fit all of your
DR tapes in the library, and you are likely to need them all during the
restore, having them in and out of the library is a hassle.

In your case, you have identical library technology at your primary and
DR site.  Good.  Tape labels are not transportable between library
manufacturers.  If the library at the DR site is different than the
library at your primary site, beware of this!  Many a good DR strategy
got blowed up when this happened.

Planning is the key to a successful DR or DR test.  Now thy data is the
watchword.  Make sure it is getting to DR tape in a way that will yield
the required efficiency during restore.  Restoring large numbers of
things takes a long time.  Restoring one very large thing is easy.  Have
as many large things as possible (image backups of file servers...).

Practice.  There isn't anything wrong with rigging the test.  If your
library permit it, load the DR tapes for a wicked bad client, mark the
primary pool volumes destroyed, and test the restore.  See how long it
takes.  Maybe you don't have a problem.  If you do, you'll learn of it
while not at the stinking Philadelphia SunGuard site wishing your
company had sprung for the catered lunch instead of eating out of the
machines at 2:30AM watching all the other hacks having a successful test
while you're stuck watching a single tape drive bottleneck the whole
test and you missed your very optimistic 7:00 PM dinner reservation at
that little crab shack and the customer is screaming down your neck.
But I digress.  Whew.  And I thought I was over that...

Time to see my therapist again.  Thanks for helping me relive this...

Kelly


Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp AT storserver DOT com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Nicholas Rodolfich
Sent: Wednesday, September 05, 2007 2:05 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Insight into improving restores needed

Hello All,

Thanks for your help!

I work at a medical institution and there is a big push to get our DR
procedures in order. We will be using a Sungard facility for our DR
activities and we will have to restore 4-6 AIX servers (2 HA clusters 1
DB with 1Tb and one apps on another cluster) and 20-25 Wintel platforms
with various applications, DBs, Novell, etc.

We have a 3584 Library with 16 drives (8 LOT1 and 8 LTO2) and a 55A with
4CPUs and 12Gb RAM. Gb NICs and will have like equipment at the DR
facility.

Some tests have revieled an extended time to restore due to many volumes
being loaded to perform a restore. I am looking for some empirical
strategies, white papers, advice, etc. to help me improve this situation
without loads of money of course! I am not ruling out any method
including backup sets, archive schedules, collocation, etc.. I can't
seem to find what I need from IBM websites regarding the gotcha's.
I need the "fastest way" shy of a $10M hot site strategy so I am going
to the mountain! I appreciate you help!!

Nicholas


IMPORTANT NOTICE:  This message and any included attachments are from
East Jefferson General Hospital, and is intended only for the
addressee(s), and may include Protected Health (PHI) or other
confidential information.  If you are the intended recipient, you are
obligated to maintain it in a secure and confidential manner and
re-disclosure without additional consent or as permitted by law is
prohibited.   If you are not the intended recipient, use of this
information is strictly prohibited and may be unlawful.  Please promptly
reply to the sender by email and delete this message from your computer.
East Jefferson General Hospital greatly appreciates your cooperation.