ADSM-L

Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?

2008-01-22 12:45:59
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 22 Jan 2008 12:43:59 -0500
James,

I like this idea a lot.  The disadvantage, of course, is that it
requires a separate server.  Is there a way to use this same (or
similar) idea to move just the active files into an active data pool (as
suggested by Maria), given that he's running 5.5 and has access to that
feature?

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
James R Owen
Sent: Tuesday, January 22, 2008 9:32 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?

Roger,
You certainly want to get a "best guess" list of likely priority#1
restores.
If your tapes really are mostly uncollocated, you will probably
experience lots of
tape volume contention when you attempt to use MAXPRocess > 1 or to run
multiple
simultaneous restore, move nodedata, or export node operations.

Use Query NODEData to see how many tapes might have to be read for each
node to be
restored.

To minimize tape mounts, if you can wait for this operation to complete,
I believe
you should try to move or export all of the nodes' data in a single
operation.

Here are possible disadvantages with using MOVe NODEData:
  - does not enable you to select to move only the Active backups for
these nodes
        [so you might have to move lots of extra inactive backups]
  - you probably can not effectively use MAXPROC=N (>1 nor run multiple
simultaneous
        MOVe NODEData commands because of contention for your
uncollocated volumes.

If you have or can set up another TSM server, you could do a
Server-Server EXPort:
        EXPort Node node1,node2,... FILEData=BACKUPActive TOServer=...
[Preview=Yes]
moving only the nodes' active backups to a diskpool on the other TSM
server.  Using
this technique, you can move only the minimal necessary data.  I don't
see any way
to multithread or run multiple simultaneous commands to read more than
one tape at
a time, but given your drive constraints and uncollocated volumes, you
will probably
discover that you can not effectively restore, move, or export from more
than one tape
at a time, no matter which technique you try.  Your Query NODEData
output should show
you which nodes, if any, do *not* have backups on the same tapes.

Try running a preview EXPort Node command for single or multiple nodes
to get some
idea of what tapes will be mounted and how much data you will need to
export.

Call me if you want to talk about any of this.
--
Jim.Owen AT Yale DOT Edu   (w#203.432.6693, Verizon c#203.494.9201)

Roger Deschner wrote:
> MOVE NODEDATA looks like it is going to be the key. I will simply move
> the affected nodes into a disk storage pool, or into our existing
> collocated tape storage pool. I presume it should be possible to
restart
> MOVE NODEDATA, in case it has to be interrupted or if the server
> crashes, because what it does is not very different from migration or
> relcamation. This should be a big advantage over GENERATE BACKUPSET,
> which is not even as restartable as a common client restore. A
possible
> strategy is to do the long, laborious, but restartable, MOVE NODEDATA
> first, and then do a very quick, painless, regular client restore or
> GENERATE BACKUPSET.
>
> Thanks to all! Until now, I was not fully aware of MOVE NODEDATA.
>
> B.T.W. It is an automatic tape library, Quantum P7000. We graduated
from
> manual tape mounting back in 1999.
>
> Roger Deschner      University of Illinois at Chicago
rogerd AT uic DOT edu
>
>
> On Tue, 22 Jan 2008, Nicholas Cassimatis wrote:
>
>> Roger,
>>
>> If you know which nodes are to be restored, or at least have some
that are
>> good suspects, you might want to run some "move nodedata" commands to
try
>> to get their data more contiguous.  If you can get some of that DASD
that's
>> coming "real soon," even just to borrow it, that would help out
>> tremendously.
>>
>> You say "tape" but never "library" - are you on manual drives?
(Please say
>> No, please say No...)  Try setting the mount retention high on them,
and
>> kick off a few restores at once.  You may get lucky and already have
the
>> needed tape mounted, saving you a few mounts.  If that's not working
(it's
>> impossible to predict which way it will go), drop the mount retention
to 0
>> so the tape ejects immediately, so the drive is ready for a new tape
>> sooner.  And if you are, try to recruit the people who haven't
approved
>> spending for the upgrades to be the "picker arm" for you - I did that
to an
>> account manager on a DR Test once, and we got the library approved
the next
>> day.
>>
>> The thoughts of your fellow TSMers are with you.
>>
>> Nick Cassimatis
>>
>> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
08:08 AM
>> -----
>>
>> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
>> 03:40:07 AM:
>>
>>> We like to talk about disaster preparedness, and one just happened
here
>>> at UIC.
>>>
>>> On Saturday morning, a fire damaged portions of the UIC College of
>>> Pharmacy Building. It affected several laboratories and offices. The
>>> Chicago Fire Department, wearing hazmat moon suits due to the highly
>>> dangerous contents of the laboratories, put it out efficiently in
about
>>> 15 minutes. The temperature was around 0F (-18C), which compounded
the
>>> problems - anything that took on water became a block of ice.
>>> Fortunately nobody was hurt; only a few people were in the building
on a
>>> Saturday morning, and they all got out safely.
>>>
>>> Now, both the good news and the bad news is that many of the damaged
>>> computers were backed up to our large TSM system. The good news is
that
>>> their data can be restored.
>>>
>>> The bad news is that their data can be restored. And so now it must
be.
>>>
>>> Our TSM system is currently an old-school tape-based setup from the
ADSM
>>> days. (Upgrades involving a lot more disk coming real soon!) Most of
the
>>> nodes affected are not collocated, so I have to plan to do a number
of
>>> full restores of nodes whose data is scattered across numerous tape
>>> volumes each. There are only 8 tape drives, and they are kept busy
since
>>> this system is in a heavily-loaded, about-to-be-upgraded state.
(Timing
>>> couldn't be worse; Murphy's Law.)
>>>
>>> TSM was recently upgraded to version 5.5.0.0. It runs on AIX 5.3
with a
>>> SCSI library. Since it is a v5.5 server, there may be new facilities
>>> available that I'm not aware of yet.
>>>
>>> I have the luxury of a little bit of time in advance. The hazmat
guys
>>> aren't letting anyone in to asess damage yet, so we don't know which
>>> client node computers are damaged or not. We should know in a day or
>>> two, so in the meantime I'm running as much reclamation as possible.
>>>
>>> Given that this is our situation, how can I best optimize these
>>> restores? I'm looking for ideas to get the most restoration done for
>>> this disaster, while still continuing normal client-backup,
migration,
>>> expiration, reclamation cycles, because somebody else unrelated to
this
>>> situation could also need to restore...
>>>
>>> Roger Deschner      University of Illinois at Chicago
rogerd AT uic DOT edu