ADSM-L

Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores? [like Steve H said, but...]

2008-01-22 23:26:55
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores? [like Steve H said, but...]
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 22 Jan 2008 23:24:35 -0500
Bummer. :( But when it's fixed, I sure think it sounds like a better
solution to this situation than the traditional answers -- even if only
used on demand.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
James R Owen
Sent: Tuesday, January 22, 2008 6:37 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores? [like
Steve H said, but...]

DR strategy using an ACTIVEdata STGpool is like Steve H said, but
with minor additions and a major (but temporary) caveat:

COPY ACTIVEdata is not quite ready for this DR strategy yet:

See APAR PK59507:  COPy ACTIVEdata performance can be significantly
degraded
(until TSM 5.4.3/5.5.1) unless *all* nodes are enabled for the
ACTIVEdata STGpool.

http://www-1.ibm.com/support/docview.wss?rs=663&context=SSGSG7&dc=DB550&;
uid=swg1PK59507&loc=en_US&cs=UTF-8&lang=en&rss=ct663tivoli

Here's a slightly improved description of how it should work:

DEFine STGpool actvpool ... POoltype=ACTIVEdata -
        COLlocate=[No/GRoup/NODe/FIlespace] ...
COPy DOmain old... new...
UPDate DOmain new... ACTIVEDESTination=actvpool
ACTivate POlicy new... somePolicy
Query SCHedule old... * NOde=node1,...,nodeN    [note old...
sched.assoc's]                  
UPDate NOde nodeX DOmain=new...                 [for each node[1-N]
DEFine ASSOCiation new... [someSched] nodeX     [as previously
associated]
COpy ACTIVEdata oldstgpool actvpool     [for each oldstgpool w/active
backups]

[If no other DOmain except new... has ACTIVEDESTination=actvpool,
 the COpy ACTIVEdata command(s) will copy the Active backups from
specified
 nodes node[1-N] into the ACTIVEdata STGpool actvpool to expedite DR
for...]

[But, not recommended until TSM 5.4.3/5.5.1 fixes APAR PK59507!]
--
Jim.Owen AT Yale DOT Edu   (203.432.6693)

Steven Harris wrote:
> Nick
>
> I may well have a flawed understanding here but....
>
> Set up an active-data pool
> clone the domain containing the servers requiring recovery
> set the ACTIVEDATAPOOL parameter on the cloned domain
> move the servers requiring recovery to the new domain,
> Run COPY ACTIVEDATA on the primary tape pool
>
> Since only the nodes we want are in the domain with the ACTIVEDATAPOOL
> parameter specified, will not only data from those nodes be copied?
>
> Regards
>
> Steve
>
> Steven Harris
> TSM Admin, SYdney Australia
>
> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 23/01/2008
> 11:38:17 AM:
>
>> For this scenario, the problem with Active Storagepools is it's a
>> pool-to-pool relationship.  So ALL active data in a storagepool would
be
>> copied to the Active Pool.  Not knowing what percentage of the nodes
on
> the
>> TSM Server will be restored, but assuming they're all in one storage
> pool,
>> you'd probably want to "move nodedata" them to another pool, then do
the
>> "copy activedata."  Two steps, and needs more resources.  Just doing
> "move
>> nodedata" within the same pool will semi-collocate the data (See Note
>> below).  Obviously, a DASD pool, for this circumstance, would be
best, if
>> it's available, but even cycling the data within the existing pool
will
>> have benefits.
>>
>> Note:  Semi-collocated, as each process will make all of the named
nodes
>> data contiguous, even if it ends up on the same media with another
nodes
>> data.  Turning on collocation before starting the jobs, and marking
all
>> filling volumes read-only, will give you separate volumes for each
node,
>> but requires a decent scratch pool to try.
>>
>> Nick Cassimatis
>>
>> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
07:25 PM
>> -----
>>
>> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
>> 01:58:11 PM:
>>
>>> Are files that are no longer active automatically expired from the
>>> activedata pool when you perform the latest COPY ACTIVEDATA?  This
> would
>>> mean that, at some point, you would need to do reclamation on this
> pool,
>>> right?
>>>
>>> It would seem to me that this would be a much better answer to TOP's
>>> question.  Instead of doing a MOVE NODE (which requires moving ALL
of
>>> the node's files), or doing an EXPORT NODE (which requires a
separate
>>> server), he can just create an ACTIVEDATA pool, then perform a COPY
>>> ACTIVEDATA into it while he's preparing for the restore.  Putting
said
>>> pool on disk would be even better, of course.
>>>
>>> I was just discussing this with another one of our TSM experts, and
> he's
>>> not as bullish on it as I am.  (It was an off-list convo, so I'll
let
>>> him go nameless unless he wants to speak up.)  He doesn't like that
you
>>> can't use a DISK type device class (disk has to be listed as FILE
> type).
>>> He also has issues with the resources needed to create this "3rd
copy"
>>> of the data.  He said, "Most customers have trouble getting backups
>>> complete and creating their offsite copies in a 24 hour period and
> would
>>> not be able to complete a third copy of the data."  Add to that the
>>> possibility of doing reclamation on this pool and you've got even
more
>>> work to do.
>>>
>>> He's more of a fan of group collocation and the multisession restore
>>> feature.  I think this has more value if you're restoring fewer
clients
>>> than you have tape drives.  Because if you collocate all your active
>>> files, then you'll only be using one tape drive per client.  If
you've
>>> got 40 clients to restore and 20 tape drives, I don't see this
slowing
>>> you down.  But if you've got one client to restore, and 20 tape
drives,
>>> then the multisession restore would probably be faster than a
> collocated
>>> restore.
>>>
>>> I still think it's a strong feature whose value should be
investigated
>>> and discussed -- even if you only use it for the purpose we're
>>> discussing here.  If you know you're in a DR scenario and you're
going
>>> to be restoring multiple systems, why wouldn't you do create an
>>> ACTIVEDATA pool and do a COPY ACTIVEDATA instead of a MOVE NODE?
>>>
>>> OK, here's another question.  Is it assumed that the ACTIVEDATA pool
>>> have node-level collocation on?  Can you use group collocation
instead?
>>> Then maybe I and my friend could both get what we want?
>>>
>>> Just throwing thoughts out there.
>>>
>>> ---
>>> W. Curtis Preston
>>> Backup Blog @ www.backupcentral.com
>>> VP Data Protection, GlassHouse Technologies
>>>
>>> -----Original Message-----
>>> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
Behalf
> Of
>>> Maria Ilieva
>>> Sent: Tuesday, January 22, 2008 10:22 AM
>>> To: ADSM-L AT VM.MARIST DOT EDU
>>> Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
>>>
>>> The procedure of creating active data pools (assuming you have TSM
>>> version 5.4 or more) is the following:
>>> 1. Create FILE type DISK pool or sequential TAPE pool specifying
>>> pooltype=ACTIVEDATA
>>> 2.Update node's domain(s) specifying ACTIVEDESTINATION=<created
active
>>> data pool>
>>> 3. Issue COPY ACTIVEDATA <node_name>
>>> This process incrementaly copies node's active data, so it can be
>>> restarted if needed. HSM migrated and archived data is not copied in
>>> the active data pool!
>>>
>>> Maria Ilieva
>>>
>>>> ---
>>>> W. Curtis Preston
>>>> Backup Blog @ www.backupcentral.com
>>>> VP Data Protection, GlassHouse Technologies
>>>>
>>>> -----Original Message-----
>>>> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
Behalf
>>> Of
>>>> James R Owen
>>>> Sent: Tuesday, January 22, 2008 9:32 AM
>>>> To: ADSM-L AT VM.MARIST DOT EDU
>>>> Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
>>>>
>>>>
>>>> Roger,
>>>> You certainly want to get a "best guess" list of likely priority#1
>>>> restores.
>>>> If your tapes really are mostly uncollocated, you will probably
>>>> experience lots of
>>>> tape volume contention when you attempt to use MAXPRocess > 1 or to
>>> run
>>>> multiple
>>>> simultaneous restore, move nodedata, or export node operations.
>>>>
>>>> Use Query NODEData to see how many tapes might have to be read for
>>> each
>>>> node to be
>>>> restored.
>>>>
>>>> To minimize tape mounts, if you can wait for this operation to
>>> complete,
>>>> I believe
>>>> you should try to move or export all of the nodes' data in a single
>>>> operation.
>>>>
>>>> Here are possible disadvantages with using MOVe NODEData:
>>>>   - does not enable you to select to move only the Active backups
for
>>>> these nodes
>>>>         [so you might have to move lots of extra inactive backups]
>>>>   - you probably can not effectively use MAXPROC=N (>1 nor run
>>> multiple
>>>> simultaneous
>>>>         MOVe NODEData commands because of contention for your
>>>> uncollocated volumes.
>>>>
>>>> If you have or can set up another TSM server, you could do a
>>>> Server-Server EXPort:
>>>>         EXPort Node node1,node2,... FILEData=BACKUPActive
> TOServer=...
>>>> [Preview=Yes]
>>>> moving only the nodes' active backups to a diskpool on the other
TSM
>>>> server.  Using
>>>> this technique, you can move only the minimal necessary data.  I
> don't
>>>> see any way
>>>> to multithread or run multiple simultaneous commands to read more
> than
>>>> one tape at
>>>> a time, but given your drive constraints and uncollocated volumes,
> you
>>>> will probably
>>>> discover that you can not effectively restore, move, or export from
>>> more
>>>> than one tape
>>>> at a time, no matter which technique you try.  Your Query NODEData
>>>> output should show
>>>> you which nodes, if any, do *not* have backups on the same tapes.
>>>>
>>>> Try running a preview EXPort Node command for single or multiple
> nodes
>>>> to get some
>>>> idea of what tapes will be mounted and how much data you will need
to
>>>> export.
>>>>
>>>> Call me if you want to talk about any of this.
>>>> --
>>>> Jim.Owen AT Yale DOT Edu   (w#203.432.6693, Verizon c#203.494.9201)
>>>>
>>>> Roger Deschner wrote:
>>>>> MOVE NODEDATA looks like it is going to be the key. I will simply
>>> move
>>>>> the affected nodes into a disk storage pool, or into our existing
>>>>> collocated tape storage pool. I presume it should be possible to
>>>> restart
>>>>> MOVE NODEDATA, in case it has to be interrupted or if the server
>>>>> crashes, because what it does is not very different from migration
>>> or
>>>>> relcamation. This should be a big advantage over GENERATE
> BACKUPSET,
>>>>> which is not even as restartable as a common client restore. A
>>>> possible
>>>>> strategy is to do the long, laborious, but restartable, MOVE
>>> NODEDATA
>>>>> first, and then do a very quick, painless, regular client restore
> or
>>>>> GENERATE BACKUPSET.
>>>>>
>>>>> Thanks to all! Until now, I was not fully aware of MOVE NODEDATA.
>>>>>
>>>>> B.T.W. It is an automatic tape library, Quantum P7000. We
graduated
>>>> from
>>>>> manual tape mounting back in 1999.
>>>>>
>>>>> Roger Deschner      University of Illinois at Chicago
>>>> rogerd AT uic DOT edu
>>>>>
>>>>> On Tue, 22 Jan 2008, Nicholas Cassimatis wrote:
>>>>>
>>>>>> Roger,
>>>>>>
>>>>>> If you know which nodes are to be restored, or at least have some
>>>> that are
>>>>>> good suspects, you might want to run some "move nodedata"
commands
>>> to
>>>> try
>>>>>> to get their data more contiguous.  If you can get some of that
>>> DASD
>>>> that's
>>>>>> coming "real soon," even just to borrow it, that would help out
>>>>>> tremendously.
>>>>>>
>>>>>> You say "tape" but never "library" - are you on manual drives?
>>>> (Please say
>>>>>> No, please say No...)  Try setting the mount retention high on
>>> them,
>>>> and
>>>>>> kick off a few restores at once.  You may get lucky and already
>>> have
>>>> the
>>>>>> needed tape mounted, saving you a few mounts.  If that's not
>>> working
>>>> (it's
>>>>>> impossible to predict which way it will go), drop the mount
>>> retention
>>>> to 0
>>>>>> so the tape ejects immediately, so the drive is ready for a new
>>> tape
>>>>>> sooner.  And if you are, try to recruit the people who haven't
>>>> approved
>>>>>> spending for the upgrades to be the "picker arm" for you - I did
>>> that
>>>> to an
>>>>>> account manager on a DR Test once, and we got the library
approved
>>>> the next
>>>>>> day.
>>>>>>
>>>>>> The thoughts of your fellow TSMers are with you.
>>>>>>
>>>>>> Nick Cassimatis
>>>>>>
>>>>>> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
>>>> 08:08 AM
>>>>>> -----
>>>>>>
>>>>>> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on
>>> 01/22/2008
>>>>>> 03:40:07 AM:
>>>>>>
>>>>>>> We like to talk about disaster preparedness, and one just
> happened
>>>> here
>>>>>>> at UIC.
>>>>>>>
>>>>>>> On Saturday morning, a fire damaged portions of the UIC College
> of
>>>>>>> Pharmacy Building. It affected several laboratories and offices.
>>> The
>>>>>>> Chicago Fire Department, wearing hazmat moon suits due to the
>>> highly
>>>>>>> dangerous contents of the laboratories, put it out efficiently
in
>>>> about
>>>>>>> 15 minutes. The temperature was around 0F (-18C), which
> compounded
>>>> the
>>>>>>> problems - anything that took on water became a block of ice.
>>>>>>> Fortunately nobody was hurt; only a few people were in the
>>> building
>>>> on a
>>>>>>> Saturday morning, and they all got out safely.
>>>>>>>
>>>>>>> Now, both the good news and the bad news is that many of the
>>> damaged
>>>>>>> computers were backed up to our large TSM system. The good news
> is
>>>> that
>>>>>>> their data can be restored.
>>>>>>>
>>>>>>> The bad news is that their data can be restored. And so now it
>>> must
>>>> be.
>>>>>>> Our TSM system is currently an old-school tape-based setup from
>>> the
>>>> ADSM
>>>>>>> days. (Upgrades involving a lot more disk coming real soon!)
Most
>>> of
>>>> the
>>>>>>> nodes affected are not collocated, so I have to plan to do a
>>> number
>>>> of
>>>>>>> full restores of nodes whose data is scattered across numerous
>>> tape
>>>>>>> volumes each. There are only 8 tape drives, and they are kept
> busy
>>>> since
>>>>>>> this system is in a heavily-loaded, about-to-be-upgraded state.
>>>> (Timing
>>>>>>> couldn't be worse; Murphy's Law.)
>>>>>>>
>>>>>>> TSM was recently upgraded to version 5.5.0.0. It runs on AIX 5.3
>>>> with a
>>>>>>> SCSI library. Since it is a v5.5 server, there may be new
>>> facilities
>>>>>>> available that I'm not aware of yet.
>>>>>>>
>>>>>>> I have the luxury of a little bit of time in advance. The hazmat
>>>> guys
>>>>>>> aren't letting anyone in to asess damage yet, so we don't know
>>> which
>>>>>>> client node computers are damaged or not. We should know in a
day
>>> or
>>>>>>> two, so in the meantime I'm running as much reclamation as
>>> possible.
>>>>>>> Given that this is our situation, how can I best optimize these
>>>>>>> restores? I'm looking for ideas to get the most restoration done
>>> for
>>>>>>> this disaster, while still continuing normal client-backup,
>>>> migration,
>>>>>>> expiration, reclamation cycles, because somebody else unrelated
> to
>>>> this
>>>>>>> situation could also need to restore...
>>>>>>>
>>>>>>> Roger Deschner      University of Illinois at Chicago
>>>> rogerd AT uic DOT edu
>>>>

<Prev in Thread] Current Thread [Next in Thread>