ADSM-L

[ADSM-L] Fw: DISASTER: How to do a LOT of restores?

2008-01-22 21:49:31
Subject: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
From: Nicholas Cassimatis <nickpc AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 22 Jan 2008 21:47:22 -0500
Curtis,

Suffering Frozen Brain Syndrome - it's Domain and Storagepool, combined,
that make data eligible to be put in the Activedata pool.

Nick Cassimatis

----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008 09:45 PM
-----

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
08:21:12 PM:

> AhHAH!  So this would only really work if he has a storage pool with
> clients that should be copied in this manner.  That makes sense.
>
> What about the expiration of inactive files the next time you do a copy
> activedata?  It doesn't say in the manual that this is what it does, but
> you would think it does it that way.  Am I right?
>
> ---
> W. Curtis Preston
> Backup Blog @ www.backupcentral.com
> VP Data Protection, GlassHouse Technologies
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of
> Nicholas Cassimatis
> Sent: Tuesday, January 22, 2008 4:38 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
>
> For this scenario, the problem with Active Storagepools is it's a
> pool-to-pool relationship.  So ALL active data in a storagepool would be
> copied to the Active Pool.  Not knowing what percentage of the nodes on
> the
> TSM Server will be restored, but assuming they're all in one storage
> pool,
> you'd probably want to "move nodedata" them to another pool, then do the
> "copy activedata."  Two steps, and needs more resources.  Just doing
> "move
> nodedata" within the same pool will semi-collocate the data (See Note
> below).  Obviously, a DASD pool, for this circumstance, would be best,
> if
> it's available, but even cycling the data within the existing pool will
> have benefits.
>
> Note:  Semi-collocated, as each process will make all of the named nodes
> data contiguous, even if it ends up on the same media with another nodes
> data.  Turning on collocation before starting the jobs, and marking all
> filling volumes read-only, will give you separate volumes for each node,
> but requires a decent scratch pool to try.
>
> Nick Cassimatis
>
> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008 07:25
> PM
> -----
>
> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
> 01:58:11 PM:
>
> > Are files that are no longer active automatically expired from the
> > activedata pool when you perform the latest COPY ACTIVEDATA?  This
> would
> > mean that, at some point, you would need to do reclamation on this
> pool,
> > right?
> >
> > It would seem to me that this would be a much better answer to TOP's
> > question.  Instead of doing a MOVE NODE (which requires moving ALL of
> > the node's files), or doing an EXPORT NODE (which requires a separate
> > server), he can just create an ACTIVEDATA pool, then perform a COPY
> > ACTIVEDATA into it while he's preparing for the restore.  Putting said
> > pool on disk would be even better, of course.
> >
> > I was just discussing this with another one of our TSM experts, and
> he's
> > not as bullish on it as I am.  (It was an off-list convo, so I'll let
> > him go nameless unless he wants to speak up.)  He doesn't like that
> you
> > can't use a DISK type device class (disk has to be listed as FILE
> type).
> >
> > He also has issues with the resources needed to create this "3rd copy"
> > of the data.  He said, "Most customers have trouble getting backups
> > complete and creating their offsite copies in a 24 hour period and
> would
> > not be able to complete a third copy of the data."  Add to that the
> > possibility of doing reclamation on this pool and you've got even more
> > work to do.
> >
> > He's more of a fan of group collocation and the multisession restore
> > feature.  I think this has more value if you're restoring fewer
> clients
> > than you have tape drives.  Because if you collocate all your active
> > files, then you'll only be using one tape drive per client.  If you've
> > got 40 clients to restore and 20 tape drives, I don't see this slowing
> > you down.  But if you've got one client to restore, and 20 tape
> drives,
> > then the multisession restore would probably be faster than a
> collocated
> > restore.
> >
> > I still think it's a strong feature whose value should be investigated
> > and discussed -- even if you only use it for the purpose we're
> > discussing here.  If you know you're in a DR scenario and you're going
> > to be restoring multiple systems, why wouldn't you do create an
> > ACTIVEDATA pool and do a COPY ACTIVEDATA instead of a MOVE NODE?
> >
> > OK, here's another question.  Is it assumed that the ACTIVEDATA pool
> > have node-level collocation on?  Can you use group collocation
> instead?
> > Then maybe I and my friend could both get what we want?
> >
> > Just throwing thoughts out there.
> >
> > ---
> > W. Curtis Preston
> > Backup Blog @ www.backupcentral.com
> > VP Data Protection, GlassHouse Technologies
> >
> > -----Original Message-----
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
> Of
> > Maria Ilieva
> > Sent: Tuesday, January 22, 2008 10:22 AM
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
> >
> > The procedure of creating active data pools (assuming you have TSM
> > version 5.4 or more) is the following:
> > 1. Create FILE type DISK pool or sequential TAPE pool specifying
> > pooltype=ACTIVEDATA
> > 2.Update node's domain(s) specifying ACTIVEDESTINATION=<created active
> > data pool>
> > 3. Issue COPY ACTIVEDATA <node_name>
> > This process incrementaly copies node's active data, so it can be
> > restarted if needed. HSM migrated and archived data is not copied in
> > the active data pool!
> >
> > Maria Ilieva
> >
> > > ---
> > > W. Curtis Preston
> > > Backup Blog @ www.backupcentral.com
> > > VP Data Protection, GlassHouse Technologies
> > >
> > > -----Original Message-----
> > > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> Behalf
> > Of
> > > James R Owen
> > > Sent: Tuesday, January 22, 2008 9:32 AM
> > > To: ADSM-L AT VM.MARIST DOT EDU
> > > Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
> > >
> > >
> > > Roger,
> > > You certainly want to get a "best guess" list of likely priority#1
> > > restores.
> > > If your tapes really are mostly uncollocated, you will probably
> > > experience lots of
> > > tape volume contention when you attempt to use MAXPRocess > 1 or to
> > run
> > > multiple
> > > simultaneous restore, move nodedata, or export node operations.
> > >
> > > Use Query NODEData to see how many tapes might have to be read for
> > each
> > > node to be
> > > restored.
> > >
> > > To minimize tape mounts, if you can wait for this operation to
> > complete,
> > > I believe
> > > you should try to move or export all of the nodes' data in a single
> > > operation.
> > >
> > > Here are possible disadvantages with using MOVe NODEData:
> > >   - does not enable you to select to move only the Active backups
> for
> > > these nodes
> > >         [so you might have to move lots of extra inactive backups]
> > >   - you probably can not effectively use MAXPROC=N (>1 nor run
> > multiple
> > > simultaneous
> > >         MOVe NODEData commands because of contention for your
> > > uncollocated volumes.
> > >
> > > If you have or can set up another TSM server, you could do a
> > > Server-Server EXPort:
> > >         EXPort Node node1,node2,... FILEData=BACKUPActive
> TOServer=...
> > > [Preview=Yes]
> > > moving only the nodes' active backups to a diskpool on the other TSM
> > > server.  Using
> > > this technique, you can move only the minimal necessary data.  I
> don't
> > > see any way
> > > to multithread or run multiple simultaneous commands to read more
> than
> > > one tape at
> > > a time, but given your drive constraints and uncollocated volumes,
> you
> > > will probably
> > > discover that you can not effectively restore, move, or export from
> > more
> > > than one tape
> > > at a time, no matter which technique you try.  Your Query NODEData
> > > output should show
> > > you which nodes, if any, do *not* have backups on the same tapes.
> > >
> > > Try running a preview EXPort Node command for single or multiple
> nodes
> > > to get some
> > > idea of what tapes will be mounted and how much data you will need
> to
> > > export.
> > >
> > > Call me if you want to talk about any of this.
> > > --
> > > Jim.Owen AT Yale DOT Edu   (w#203.432.6693, Verizon c#203.494.9201)
> > >
> > > Roger Deschner wrote:
> > > > MOVE NODEDATA looks like it is going to be the key. I will simply
> > move
> > > > the affected nodes into a disk storage pool, or into our existing
> > > > collocated tape storage pool. I presume it should be possible to
> > > restart
> > > > MOVE NODEDATA, in case it has to be interrupted or if the server
> > > > crashes, because what it does is not very different from migration
> > or
> > > > relcamation. This should be a big advantage over GENERATE
> BACKUPSET,
> > > > which is not even as restartable as a common client restore. A
> > > possible
> > > > strategy is to do the long, laborious, but restartable, MOVE
> > NODEDATA
> > > > first, and then do a very quick, painless, regular client restore
> or
> > > > GENERATE BACKUPSET.
> > > >
> > > > Thanks to all! Until now, I was not fully aware of MOVE NODEDATA.
> > > >
> > > > B.T.W. It is an automatic tape library, Quantum P7000. We
> graduated
> > > from
> > > > manual tape mounting back in 1999.
> > > >
> > > > Roger Deschner      University of Illinois at Chicago
> > > rogerd AT uic DOT edu
> > > >
> > > >
> > > > On Tue, 22 Jan 2008, Nicholas Cassimatis wrote:
> > > >
> > > >> Roger,
> > > >>
> > > >> If you know which nodes are to be restored, or at least have some
> > > that are
> > > >> good suspects, you might want to run some "move nodedata"
> commands
> > to
> > > try
> > > >> to get their data more contiguous.  If you can get some of that
> > DASD
> > > that's
> > > >> coming "real soon," even just to borrow it, that would help out
> > > >> tremendously.
> > > >>
> > > >> You say "tape" but never "library" - are you on manual drives?
> > > (Please say
> > > >> No, please say No...)  Try setting the mount retention high on
> > them,
> > > and
> > > >> kick off a few restores at once.  You may get lucky and already
> > have
> > > the
> > > >> needed tape mounted, saving you a few mounts.  If that's not
> > working
> > > (it's
> > > >> impossible to predict which way it will go), drop the mount
> > retention
> > > to 0
> > > >> so the tape ejects immediately, so the drive is ready for a new
> > tape
> > > >> sooner.  And if you are, try to recruit the people who haven't
> > > approved
> > > >> spending for the upgrades to be the "picker arm" for you - I did
> > that
> > > to an
> > > >> account manager on a DR Test once, and we got the library
> approved
> > > the next
> > > >> day.
> > > >>
> > > >> The thoughts of your fellow TSMers are with you.
> > > >>
> > > >> Nick Cassimatis
> > > >>
> > > >> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
> > > 08:08 AM
> > > >> -----
> > > >>
> > > >> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on
> > 01/22/2008
> > > >> 03:40:07 AM:
> > > >>
> > > >>> We like to talk about disaster preparedness, and one just
> happened
> > > here
> > > >>> at UIC.
> > > >>>
> > > >>> On Saturday morning, a fire damaged portions of the UIC College
> of
> > > >>> Pharmacy Building. It affected several laboratories and offices.
> > The
> > > >>> Chicago Fire Department, wearing hazmat moon suits due to the
> > highly
> > > >>> dangerous contents of the laboratories, put it out efficiently
> in
> > > about
> > > >>> 15 minutes. The temperature was around 0F (-18C), which
> compounded
> > > the
> > > >>> problems - anything that took on water became a block of ice.
> > > >>> Fortunately nobody was hurt; only a few people were in the
> > building
> > > on a
> > > >>> Saturday morning, and they all got out safely.
> > > >>>
> > > >>> Now, both the good news and the bad news is that many of the
> > damaged
> > > >>> computers were backed up to our large TSM system. The good news
> is
> > > that
> > > >>> their data can be restored.
> > > >>>
> > > >>> The bad news is that their data can be restored. And so now it
> > must
> > > be.
> > > >>>
> > > >>> Our TSM system is currently an old-school tape-based setup from
> > the
> > > ADSM
> > > >>> days. (Upgrades involving a lot more disk coming real soon!)
> Most
> > of
> > > the
> > > >>> nodes affected are not collocated, so I have to plan to do a
> > number
> > > of
> > > >>> full restores of nodes whose data is scattered across numerous
> > tape
> > > >>> volumes each. There are only 8 tape drives, and they are kept
> busy
> > > since
> > > >>> this system is in a heavily-loaded, about-to-be-upgraded state.
> > > (Timing
> > > >>> couldn't be worse; Murphy's Law.)
> > > >>>
> > > >>> TSM was recently upgraded to version 5.5.0.0. It runs on AIX 5.3
> > > with a
> > > >>> SCSI library. Since it is a v5.5 server, there may be new
> > facilities
> > > >>> available that I'm not aware of yet.
> > > >>>
> > > >>> I have the luxury of a little bit of time in advance. The hazmat
> > > guys
> > > >>> aren't letting anyone in to asess damage yet, so we don't know
> > which
> > > >>> client node computers are damaged or not. We should know in a
> day
> > or
> > > >>> two, so in the meantime I'm running as much reclamation as
> > possible.
> > > >>>
> > > >>> Given that this is our situation, how can I best optimize these
> > > >>> restores? I'm looking for ideas to get the most restoration done
> > for
> > > >>> this disaster, while still continuing normal client-backup,
> > > migration,
> > > >>> expiration, reclamation cycles, because somebody else unrelated
> to
> > > this
> > > >>> situation could also need to restore...
> > > >>>
> > > >>> Roger Deschner      University of Illinois at Chicago
> > > rogerd AT uic DOT edu
> > >

<Prev in Thread] Current Thread [Next in Thread>