ADSM-L

Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores? [like Steve H said, but...]

2008-01-22 21:37:45
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores? [like Steve H said, but...]
From: James R Owen <Jim.Owen AT YALE DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 22 Jan 2008 21:36:55 -0500
DR strategy using an ACTIVEdata STGpool is like Steve H said, but
with minor additions and a major (but temporary) caveat:

COPY ACTIVEdata is not quite ready for this DR strategy yet:

See APAR PK59507:  COPy ACTIVEdata performance can be significantly degraded
(until TSM 5.4.3/5.5.1) unless *all* nodes are enabled for the ACTIVEdata 
STGpool.

http://www-1.ibm.com/support/docview.wss?rs=663&context=SSGSG7&dc=DB550&uid=swg1PK59507&loc=en_US&cs=UTF-8&lang=en&rss=ct663tivoli

Here's a slightly improved description of how it should work:

DEFine STGpool actvpool ... POoltype=ACTIVEdata -
        COLlocate=[No/GRoup/NODe/FIlespace] ...
COPy DOmain old... new...
UPDate DOmain new... ACTIVEDESTination=actvpool
ACTivate POlicy new... somePolicy
Query SCHedule old... * NOde=node1,...,nodeN    [note old... sched.assoc's]     
                
UPDate NOde nodeX DOmain=new...                 [for each node[1-N]
DEFine ASSOCiation new... [someSched] nodeX     [as previously associated]
COpy ACTIVEdata oldstgpool actvpool     [for each oldstgpool w/active backups]

[If no other DOmain except new... has ACTIVEDESTination=actvpool,
the COpy ACTIVEdata command(s) will copy the Active backups from specified
nodes node[1-N] into the ACTIVEdata STGpool actvpool to expedite DR for...]

[But, not recommended until TSM 5.4.3/5.5.1 fixes APAR PK59507!]
--
Jim.Owen AT Yale DOT Edu   (203.432.6693)

Steven Harris wrote:
Nick

I may well have a flawed understanding here but....

Set up an active-data pool
clone the domain containing the servers requiring recovery
set the ACTIVEDATAPOOL parameter on the cloned domain
move the servers requiring recovery to the new domain,
Run COPY ACTIVEDATA on the primary tape pool

Since only the nodes we want are in the domain with the ACTIVEDATAPOOL
parameter specified, will not only data from those nodes be copied?

Regards

Steve

Steven Harris
TSM Admin, SYdney Australia

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 23/01/2008
11:38:17 AM:

For this scenario, the problem with Active Storagepools is it's a
pool-to-pool relationship.  So ALL active data in a storagepool would be
copied to the Active Pool.  Not knowing what percentage of the nodes on
the
TSM Server will be restored, but assuming they're all in one storage
pool,
you'd probably want to "move nodedata" them to another pool, then do the
"copy activedata."  Two steps, and needs more resources.  Just doing
"move
nodedata" within the same pool will semi-collocate the data (See Note
below).  Obviously, a DASD pool, for this circumstance, would be best, if
it's available, but even cycling the data within the existing pool will
have benefits.

Note:  Semi-collocated, as each process will make all of the named nodes
data contiguous, even if it ends up on the same media with another nodes
data.  Turning on collocation before starting the jobs, and marking all
filling volumes read-only, will give you separate volumes for each node,
but requires a decent scratch pool to try.

Nick Cassimatis

----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008 07:25 PM
-----

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
01:58:11 PM:

Are files that are no longer active automatically expired from the
activedata pool when you perform the latest COPY ACTIVEDATA?  This
would
mean that, at some point, you would need to do reclamation on this
pool,
right?

It would seem to me that this would be a much better answer to TOP's
question.  Instead of doing a MOVE NODE (which requires moving ALL of
the node's files), or doing an EXPORT NODE (which requires a separate
server), he can just create an ACTIVEDATA pool, then perform a COPY
ACTIVEDATA into it while he's preparing for the restore.  Putting said
pool on disk would be even better, of course.

I was just discussing this with another one of our TSM experts, and
he's
not as bullish on it as I am.  (It was an off-list convo, so I'll let
him go nameless unless he wants to speak up.)  He doesn't like that you
can't use a DISK type device class (disk has to be listed as FILE
type).
He also has issues with the resources needed to create this "3rd copy"
of the data.  He said, "Most customers have trouble getting backups
complete and creating their offsite copies in a 24 hour period and
would
not be able to complete a third copy of the data."  Add to that the
possibility of doing reclamation on this pool and you've got even more
work to do.

He's more of a fan of group collocation and the multisession restore
feature.  I think this has more value if you're restoring fewer clients
than you have tape drives.  Because if you collocate all your active
files, then you'll only be using one tape drive per client.  If you've
got 40 clients to restore and 20 tape drives, I don't see this slowing
you down.  But if you've got one client to restore, and 20 tape drives,
then the multisession restore would probably be faster than a
collocated
restore.

I still think it's a strong feature whose value should be investigated
and discussed -- even if you only use it for the purpose we're
discussing here.  If you know you're in a DR scenario and you're going
to be restoring multiple systems, why wouldn't you do create an
ACTIVEDATA pool and do a COPY ACTIVEDATA instead of a MOVE NODE?

OK, here's another question.  Is it assumed that the ACTIVEDATA pool
have node-level collocation on?  Can you use group collocation instead?
Then maybe I and my friend could both get what we want?

Just throwing thoughts out there.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of
Maria Ilieva
Sent: Tuesday, January 22, 2008 10:22 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?

The procedure of creating active data pools (assuming you have TSM
version 5.4 or more) is the following:
1. Create FILE type DISK pool or sequential TAPE pool specifying
pooltype=ACTIVEDATA
2.Update node's domain(s) specifying ACTIVEDESTINATION=<created active
data pool>
3. Issue COPY ACTIVEDATA <node_name>
This process incrementaly copies node's active data, so it can be
restarted if needed. HSM migrated and archived data is not copied in
the active data pool!

Maria Ilieva

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of
James R Owen
Sent: Tuesday, January 22, 2008 9:32 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?


Roger,
You certainly want to get a "best guess" list of likely priority#1
restores.
If your tapes really are mostly uncollocated, you will probably
experience lots of
tape volume contention when you attempt to use MAXPRocess > 1 or to
run
multiple
simultaneous restore, move nodedata, or export node operations.

Use Query NODEData to see how many tapes might have to be read for
each
node to be
restored.

To minimize tape mounts, if you can wait for this operation to
complete,
I believe
you should try to move or export all of the nodes' data in a single
operation.

Here are possible disadvantages with using MOVe NODEData:
  - does not enable you to select to move only the Active backups for
these nodes
        [so you might have to move lots of extra inactive backups]
  - you probably can not effectively use MAXPROC=N (>1 nor run
multiple
simultaneous
        MOVe NODEData commands because of contention for your
uncollocated volumes.

If you have or can set up another TSM server, you could do a
Server-Server EXPort:
        EXPort Node node1,node2,... FILEData=BACKUPActive
TOServer=...
[Preview=Yes]
moving only the nodes' active backups to a diskpool on the other TSM
server.  Using
this technique, you can move only the minimal necessary data.  I
don't
see any way
to multithread or run multiple simultaneous commands to read more
than
one tape at
a time, but given your drive constraints and uncollocated volumes,
you
will probably
discover that you can not effectively restore, move, or export from
more
than one tape
at a time, no matter which technique you try.  Your Query NODEData
output should show
you which nodes, if any, do *not* have backups on the same tapes.

Try running a preview EXPort Node command for single or multiple
nodes
to get some
idea of what tapes will be mounted and how much data you will need to
export.

Call me if you want to talk about any of this.
--
Jim.Owen AT Yale DOT Edu   (w#203.432.6693, Verizon c#203.494.9201)

Roger Deschner wrote:
MOVE NODEDATA looks like it is going to be the key. I will simply
move
the affected nodes into a disk storage pool, or into our existing
collocated tape storage pool. I presume it should be possible to
restart
MOVE NODEDATA, in case it has to be interrupted or if the server
crashes, because what it does is not very different from migration
or
relcamation. This should be a big advantage over GENERATE
BACKUPSET,
which is not even as restartable as a common client restore. A
possible
strategy is to do the long, laborious, but restartable, MOVE
NODEDATA
first, and then do a very quick, painless, regular client restore
or
GENERATE BACKUPSET.

Thanks to all! Until now, I was not fully aware of MOVE NODEDATA.

B.T.W. It is an automatic tape library, Quantum P7000. We graduated
from
manual tape mounting back in 1999.

Roger Deschner      University of Illinois at Chicago
rogerd AT uic DOT edu

On Tue, 22 Jan 2008, Nicholas Cassimatis wrote:

Roger,

If you know which nodes are to be restored, or at least have some
that are
good suspects, you might want to run some "move nodedata" commands
to
try
to get their data more contiguous.  If you can get some of that
DASD
that's
coming "real soon," even just to borrow it, that would help out
tremendously.

You say "tape" but never "library" - are you on manual drives?
(Please say
No, please say No...)  Try setting the mount retention high on
them,
and
kick off a few restores at once.  You may get lucky and already
have
the
needed tape mounted, saving you a few mounts.  If that's not
working
(it's
impossible to predict which way it will go), drop the mount
retention
to 0
so the tape ejects immediately, so the drive is ready for a new
tape
sooner.  And if you are, try to recruit the people who haven't
approved
spending for the upgrades to be the "picker arm" for you - I did
that
to an
account manager on a DR Test once, and we got the library approved
the next
day.

The thoughts of your fellow TSMers are with you.

Nick Cassimatis

----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
08:08 AM
-----

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on
01/22/2008
03:40:07 AM:

We like to talk about disaster preparedness, and one just
happened
here
at UIC.

On Saturday morning, a fire damaged portions of the UIC College
of
Pharmacy Building. It affected several laboratories and offices.
The
Chicago Fire Department, wearing hazmat moon suits due to the
highly
dangerous contents of the laboratories, put it out efficiently in
about
15 minutes. The temperature was around 0F (-18C), which
compounded
the
problems - anything that took on water became a block of ice.
Fortunately nobody was hurt; only a few people were in the
building
on a
Saturday morning, and they all got out safely.

Now, both the good news and the bad news is that many of the
damaged
computers were backed up to our large TSM system. The good news
is
that
their data can be restored.

The bad news is that their data can be restored. And so now it
must
be.
Our TSM system is currently an old-school tape-based setup from
the
ADSM
days. (Upgrades involving a lot more disk coming real soon!) Most
of
the
nodes affected are not collocated, so I have to plan to do a
number
of
full restores of nodes whose data is scattered across numerous
tape
volumes each. There are only 8 tape drives, and they are kept
busy
since
this system is in a heavily-loaded, about-to-be-upgraded state.
(Timing
couldn't be worse; Murphy's Law.)

TSM was recently upgraded to version 5.5.0.0. It runs on AIX 5.3
with a
SCSI library. Since it is a v5.5 server, there may be new
facilities
available that I'm not aware of yet.

I have the luxury of a little bit of time in advance. The hazmat
guys
aren't letting anyone in to asess damage yet, so we don't know
which
client node computers are damaged or not. We should know in a day
or
two, so in the meantime I'm running as much reclamation as
possible.
Given that this is our situation, how can I best optimize these
restores? I'm looking for ideas to get the most restoration done
for
this disaster, while still continuing normal client-backup,
migration,
expiration, reclamation cycles, because somebody else unrelated
to
this
situation could also need to restore...

Roger Deschner      University of Illinois at Chicago
rogerd AT uic DOT edu


<Prev in Thread] Current Thread [Next in Thread>