ADSM-L

[ADSM-L] Fw: DISASTER: How to do a LOT of restores?

2008-01-22 21:48:27
Subject: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
From: Nicholas Cassimatis <nickpc AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 22 Jan 2008 21:45:27 -0500
Obviously, being in Chicago this week has frozen my brain (or maybe I'm
downwind from UIC...).  Yes, you're correct - it is Domain and Storagepool
combined.

Nick Cassimatis

----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008 09:42 PM
-----

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
08:24:29 PM:

> Nick
>
> I may well have a flawed understanding here but....
>
> Set up an active-data pool
> clone the domain containing the servers requiring recovery
> set the ACTIVEDATAPOOL parameter on the cloned domain
> move the servers requiring recovery to the new domain,
> Run COPY ACTIVEDATA on the primary tape pool
>
> Since only the nodes we want are in the domain with the ACTIVEDATAPOOL
> parameter specified, will not only data from those nodes be copied?
>
> Regards
>
> Steve
>
> Steven Harris
> TSM Admin, SYdney Australia
>
> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 23/01/2008
> 11:38:17 AM:
>
> > For this scenario, the problem with Active Storagepools is it's a
> > pool-to-pool relationship.  So ALL active data in a storagepool would
be
> > copied to the Active Pool.  Not knowing what percentage of the nodes on
> the
> > TSM Server will be restored, but assuming they're all in one storage
> pool,
> > you'd probably want to "move nodedata" them to another pool, then do
the
> > "copy activedata."  Two steps, and needs more resources.  Just doing
> "move
> > nodedata" within the same pool will semi-collocate the data (See Note
> > below).  Obviously, a DASD pool, for this circumstance, would be best,
if
> > it's available, but even cycling the data within the existing pool will
> > have benefits.
> >
> > Note:  Semi-collocated, as each process will make all of the named
nodes
> > data contiguous, even if it ends up on the same media with another
nodes
> > data.  Turning on collocation before starting the jobs, and marking all
> > filling volumes read-only, will give you separate volumes for each
node,
> > but requires a decent scratch pool to try.
> >
> > Nick Cassimatis
> >
> > ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008 07:25
PM
> > -----
> >
> > "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
> > 01:58:11 PM:
> >
> > > Are files that are no longer active automatically expired from the
> > > activedata pool when you perform the latest COPY ACTIVEDATA?  This
> would
> > > mean that, at some point, you would need to do reclamation on this
> pool,
> > > right?
> > >
> > > It would seem to me that this would be a much better answer to TOP's
> > > question.  Instead of doing a MOVE NODE (which requires moving ALL of
> > > the node's files), or doing an EXPORT NODE (which requires a separate
> > > server), he can just create an ACTIVEDATA pool, then perform a COPY
> > > ACTIVEDATA into it while he's preparing for the restore.  Putting
said
> > > pool on disk would be even better, of course.
> > >
> > > I was just discussing this with another one of our TSM experts, and
> he's
> > > not as bullish on it as I am.  (It was an off-list convo, so I'll let
> > > him go nameless unless he wants to speak up.)  He doesn't like that
you
> > > can't use a DISK type device class (disk has to be listed as FILE
> type).
> > >
> > > He also has issues with the resources needed to create this "3rd
copy"
> > > of the data.  He said, "Most customers have trouble getting backups
> > > complete and creating their offsite copies in a 24 hour period and
> would
> > > not be able to complete a third copy of the data."  Add to that the
> > > possibility of doing reclamation on this pool and you've got even
more
> > > work to do.
> > >
> > > He's more of a fan of group collocation and the multisession restore
> > > feature.  I think this has more value if you're restoring fewer
clients
> > > than you have tape drives.  Because if you collocate all your active
> > > files, then you'll only be using one tape drive per client.  If
you've
> > > got 40 clients to restore and 20 tape drives, I don't see this
slowing
> > > you down.  But if you've got one client to restore, and 20 tape
drives,
> > > then the multisession restore would probably be faster than a
> collocated
> > > restore.
> > >
> > > I still think it's a strong feature whose value should be
investigated
> > > and discussed -- even if you only use it for the purpose we're
> > > discussing here.  If you know you're in a DR scenario and you're
going
> > > to be restoring multiple systems, why wouldn't you do create an
> > > ACTIVEDATA pool and do a COPY ACTIVEDATA instead of a MOVE NODE?
> > >
> > > OK, here's another question.  Is it assumed that the ACTIVEDATA pool
> > > have node-level collocation on?  Can you use group collocation
instead?
> > > Then maybe I and my friend could both get what we want?
> > >
> > > Just throwing thoughts out there.
> > >
> > > ---
> > > W. Curtis Preston
> > > Backup Blog @ www.backupcentral.com
> > > VP Data Protection, GlassHouse Technologies
> > >
> > > -----Original Message-----
> > > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On 
> > > Behalf
> Of
> > > Maria Ilieva
> > > Sent: Tuesday, January 22, 2008 10:22 AM
> > > To: ADSM-L AT VM.MARIST DOT EDU
> > > Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
> > >
> > > The procedure of creating active data pools (assuming you have TSM
> > > version 5.4 or more) is the following:
> > > 1. Create FILE type DISK pool or sequential TAPE pool specifying
> > > pooltype=ACTIVEDATA
> > > 2.Update node's domain(s) specifying ACTIVEDESTINATION=<created
active
> > > data pool>
> > > 3. Issue COPY ACTIVEDATA <node_name>
> > > This process incrementaly copies node's active data, so it can be
> > > restarted if needed. HSM migrated and archived data is not copied in
> > > the active data pool!
> > >
> > > Maria Ilieva
> > >
> > > > ---
> > > > W. Curtis Preston
> > > > Backup Blog @ www.backupcentral.com
> > > > VP Data Protection, GlassHouse Technologies
> > > >
> > > > -----Original Message-----
> > > > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
Behalf
> > > Of
> > > > James R Owen
> > > > Sent: Tuesday, January 22, 2008 9:32 AM
> > > > To: ADSM-L AT VM.MARIST DOT EDU
> > > > Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
> > > >
> > > >
> > > > Roger,
> > > > You certainly want to get a "best guess" list of likely priority#1
> > > > restores.
> > > > If your tapes really are mostly uncollocated, you will probably
> > > > experience lots of
> > > > tape volume contention when you attempt to use MAXPRocess > 1 or to
> > > run
> > > > multiple
> > > > simultaneous restore, move nodedata, or export node operations.
> > > >
> > > > Use Query NODEData to see how many tapes might have to be read for
> > > each
> > > > node to be
> > > > restored.
> > > >
> > > > To minimize tape mounts, if you can wait for this operation to
> > > complete,
> > > > I believe
> > > > you should try to move or export all of the nodes' data in a single
> > > > operation.
> > > >
> > > > Here are possible disadvantages with using MOVe NODEData:
> > > >   - does not enable you to select to move only the Active backups
for
> > > > these nodes
> > > >         [so you might have to move lots of extra inactive backups]
> > > >   - you probably can not effectively use MAXPROC=N (>1 nor run
> > > multiple
> > > > simultaneous
> > > >         MOVe NODEData commands because of contention for your
> > > > uncollocated volumes.
> > > >
> > > > If you have or can set up another TSM server, you could do a
> > > > Server-Server EXPort:
> > > >         EXPort Node node1,node2,... FILEData=BACKUPActive
> TOServer=...
> > > > [Preview=Yes]
> > > > moving only the nodes' active backups to a diskpool on the other
TSM
> > > > server.  Using
> > > > this technique, you can move only the minimal necessary data.  I
> don't
> > > > see any way
> > > > to multithread or run multiple simultaneous commands to read more
> than
> > > > one tape at
> > > > a time, but given your drive constraints and uncollocated volumes,
> you
> > > > will probably
> > > > discover that you can not effectively restore, move, or export from
> > > more
> > > > than one tape
> > > > at a time, no matter which technique you try.  Your Query NODEData
> > > > output should show
> > > > you which nodes, if any, do *not* have backups on the same tapes.
> > > >
> > > > Try running a preview EXPort Node command for single or multiple
> nodes
> > > > to get some
> > > > idea of what tapes will be mounted and how much data you will need
to
> > > > export.
> > > >
> > > > Call me if you want to talk about any of this.
> > > > --
> > > > Jim.Owen AT Yale DOT Edu   (w#203.432.6693, Verizon c#203.494.9201)
> > > >
> > > > Roger Deschner wrote:
> > > > > MOVE NODEDATA looks like it is going to be the key. I will simply
> > > move
> > > > > the affected nodes into a disk storage pool, or into our existing
> > > > > collocated tape storage pool. I presume it should be possible to
> > > > restart
> > > > > MOVE NODEDATA, in case it has to be interrupted or if the server
> > > > > crashes, because what it does is not very different from
migration
> > > or
> > > > > relcamation. This should be a big advantage over GENERATE
> BACKUPSET,
> > > > > which is not even as restartable as a common client restore. A
> > > > possible
> > > > > strategy is to do the long, laborious, but restartable, MOVE
> > > NODEDATA
> > > > > first, and then do a very quick, painless, regular client restore
> or
> > > > > GENERATE BACKUPSET.
> > > > >
> > > > > Thanks to all! Until now, I was not fully aware of MOVE NODEDATA.
> > > > >
> > > > > B.T.W. It is an automatic tape library, Quantum P7000. We
graduated
> > > > from
> > > > > manual tape mounting back in 1999.
> > > > >
> > > > > Roger Deschner      University of Illinois at Chicago
> > > > rogerd AT uic DOT edu
> > > > >
> > > > >
> > > > > On Tue, 22 Jan 2008, Nicholas Cassimatis wrote:
> > > > >
> > > > >> Roger,
> > > > >>
> > > > >> If you know which nodes are to be restored, or at least have
some
> > > > that are
> > > > >> good suspects, you might want to run some "move nodedata"
commands
> > > to
> > > > try
> > > > >> to get their data more contiguous.  If you can get some of that
> > > DASD
> > > > that's
> > > > >> coming "real soon," even just to borrow it, that would help out
> > > > >> tremendously.
> > > > >>
> > > > >> You say "tape" but never "library" - are you on manual drives?
> > > > (Please say
> > > > >> No, please say No...)  Try setting the mount retention high on
> > > them,
> > > > and
> > > > >> kick off a few restores at once.  You may get lucky and already
> > > have
> > > > the
> > > > >> needed tape mounted, saving you a few mounts.  If that's not
> > > working
> > > > (it's
> > > > >> impossible to predict which way it will go), drop the mount
> > > retention
> > > > to 0
> > > > >> so the tape ejects immediately, so the drive is ready for a new
> > > tape
> > > > >> sooner.  And if you are, try to recruit the people who haven't
> > > > approved
> > > > >> spending for the upgrades to be the "picker arm" for you - I did
> > > that
> > > > to an
> > > > >> account manager on a DR Test once, and we got the library
approved
> > > > the next
> > > > >> day.
> > > > >>
> > > > >> The thoughts of your fellow TSMers are with you.
> > > > >>
> > > > >> Nick Cassimatis
> > > > >>
> > > > >> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
> > > > 08:08 AM
> > > > >> -----
> > > > >>
> > > > >> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on
> > > 01/22/2008
> > > > >> 03:40:07 AM:
> > > > >>
> > > > >>> We like to talk about disaster preparedness, and one just
> happened
> > > > here
> > > > >>> at UIC.
> > > > >>>
> > > > >>> On Saturday morning, a fire damaged portions of the UIC College
> of
> > > > >>> Pharmacy Building. It affected several laboratories and
offices.
> > > The
> > > > >>> Chicago Fire Department, wearing hazmat moon suits due to the
> > > highly
> > > > >>> dangerous contents of the laboratories, put it out efficiently
in
> > > > about
> > > > >>> 15 minutes. The temperature was around 0F (-18C), which
> compounded
> > > > the
> > > > >>> problems - anything that took on water became a block of ice.
> > > > >>> Fortunately nobody was hurt; only a few people were in the
> > > building
> > > > on a
> > > > >>> Saturday morning, and they all got out safely.
> > > > >>>
> > > > >>> Now, both the good news and the bad news is that many of the
> > > damaged
> > > > >>> computers were backed up to our large TSM system. The good news
> is
> > > > that
> > > > >>> their data can be restored.
> > > > >>>
> > > > >>> The bad news is that their data can be restored. And so now it
> > > must
> > > > be.
> > > > >>>
> > > > >>> Our TSM system is currently an old-school tape-based setup from
> > > the
> > > > ADSM
> > > > >>> days. (Upgrades involving a lot more disk coming real soon!)
Most
> > > of
> > > > the
> > > > >>> nodes affected are not collocated, so I have to plan to do a
> > > number
> > > > of
> > > > >>> full restores of nodes whose data is scattered across numerous
> > > tape
> > > > >>> volumes each. There are only 8 tape drives, and they are kept
> busy
> > > > since
> > > > >>> this system is in a heavily-loaded, about-to-be-upgraded state.
> > > > (Timing
> > > > >>> couldn't be worse; Murphy's Law.)
> > > > >>>
> > > > >>> TSM was recently upgraded to version 5.5.0.0. It runs on AIX
5.3
> > > > with a
> > > > >>> SCSI library. Since it is a v5.5 server, there may be new
> > > facilities
> > > > >>> available that I'm not aware of yet.
> > > > >>>
> > > > >>> I have the luxury of a little bit of time in advance. The
hazmat
> > > > guys
> > > > >>> aren't letting anyone in to asess damage yet, so we don't know
> > > which
> > > > >>> client node computers are damaged or not. We should know in a
day
> > > or
> > > > >>> two, so in the meantime I'm running as much reclamation as
> > > possible.
> > > > >>>
> > > > >>> Given that this is our situation, how can I best optimize these
> > > > >>> restores? I'm looking for ideas to get the most restoration
done
> > > for
> > > > >>> this disaster, while still continuing normal client-backup,
> > > > migration,
> > > > >>> expiration, reclamation cycles, because somebody else unrelated
> to
> > > > this
> > > > >>> situation could also need to restore...
> > > > >>>
> > > > >>> Roger Deschner      University of Illinois at Chicago
> > > > rogerd AT uic DOT edu
> > > >