ADSM-L

Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?

2008-01-22 20:23:22
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 22 Jan 2008 20:21:12 -0500
AhHAH!  So this would only really work if he has a storage pool with
clients that should be copied in this manner.  That makes sense.

What about the expiration of inactive files the next time you do a copy
activedata?  It doesn't say in the manual that this is what it does, but
you would think it does it that way.  Am I right?

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Nicholas Cassimatis
Sent: Tuesday, January 22, 2008 4:38 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?

For this scenario, the problem with Active Storagepools is it's a
pool-to-pool relationship.  So ALL active data in a storagepool would be
copied to the Active Pool.  Not knowing what percentage of the nodes on
the
TSM Server will be restored, but assuming they're all in one storage
pool,
you'd probably want to "move nodedata" them to another pool, then do the
"copy activedata."  Two steps, and needs more resources.  Just doing
"move
nodedata" within the same pool will semi-collocate the data (See Note
below).  Obviously, a DASD pool, for this circumstance, would be best,
if
it's available, but even cycling the data within the existing pool will
have benefits.

Note:  Semi-collocated, as each process will make all of the named nodes
data contiguous, even if it ends up on the same media with another nodes
data.  Turning on collocation before starting the jobs, and marking all
filling volumes read-only, will give you separate volumes for each node,
but requires a decent scratch pool to try.

Nick Cassimatis

----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008 07:25
PM
-----

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 01/22/2008
01:58:11 PM:

> Are files that are no longer active automatically expired from the
> activedata pool when you perform the latest COPY ACTIVEDATA?  This
would
> mean that, at some point, you would need to do reclamation on this
pool,
> right?
>
> It would seem to me that this would be a much better answer to TOP's
> question.  Instead of doing a MOVE NODE (which requires moving ALL of
> the node's files), or doing an EXPORT NODE (which requires a separate
> server), he can just create an ACTIVEDATA pool, then perform a COPY
> ACTIVEDATA into it while he's preparing for the restore.  Putting said
> pool on disk would be even better, of course.
>
> I was just discussing this with another one of our TSM experts, and
he's
> not as bullish on it as I am.  (It was an off-list convo, so I'll let
> him go nameless unless he wants to speak up.)  He doesn't like that
you
> can't use a DISK type device class (disk has to be listed as FILE
type).
>
> He also has issues with the resources needed to create this "3rd copy"
> of the data.  He said, "Most customers have trouble getting backups
> complete and creating their offsite copies in a 24 hour period and
would
> not be able to complete a third copy of the data."  Add to that the
> possibility of doing reclamation on this pool and you've got even more
> work to do.
>
> He's more of a fan of group collocation and the multisession restore
> feature.  I think this has more value if you're restoring fewer
clients
> than you have tape drives.  Because if you collocate all your active
> files, then you'll only be using one tape drive per client.  If you've
> got 40 clients to restore and 20 tape drives, I don't see this slowing
> you down.  But if you've got one client to restore, and 20 tape
drives,
> then the multisession restore would probably be faster than a
collocated
> restore.
>
> I still think it's a strong feature whose value should be investigated
> and discussed -- even if you only use it for the purpose we're
> discussing here.  If you know you're in a DR scenario and you're going
> to be restoring multiple systems, why wouldn't you do create an
> ACTIVEDATA pool and do a COPY ACTIVEDATA instead of a MOVE NODE?
>
> OK, here's another question.  Is it assumed that the ACTIVEDATA pool
> have node-level collocation on?  Can you use group collocation
instead?
> Then maybe I and my friend could both get what we want?
>
> Just throwing thoughts out there.
>
> ---
> W. Curtis Preston
> Backup Blog @ www.backupcentral.com
> VP Data Protection, GlassHouse Technologies
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of
> Maria Ilieva
> Sent: Tuesday, January 22, 2008 10:22 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
>
> The procedure of creating active data pools (assuming you have TSM
> version 5.4 or more) is the following:
> 1. Create FILE type DISK pool or sequential TAPE pool specifying
> pooltype=ACTIVEDATA
> 2.Update node's domain(s) specifying ACTIVEDESTINATION=<created active
> data pool>
> 3. Issue COPY ACTIVEDATA <node_name>
> This process incrementaly copies node's active data, so it can be
> restarted if needed. HSM migrated and archived data is not copied in
> the active data pool!
>
> Maria Ilieva
>
> > ---
> > W. Curtis Preston
> > Backup Blog @ www.backupcentral.com
> > VP Data Protection, GlassHouse Technologies
> >
> > -----Original Message-----
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
Behalf
> Of
> > James R Owen
> > Sent: Tuesday, January 22, 2008 9:32 AM
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
> >
> >
> > Roger,
> > You certainly want to get a "best guess" list of likely priority#1
> > restores.
> > If your tapes really are mostly uncollocated, you will probably
> > experience lots of
> > tape volume contention when you attempt to use MAXPRocess > 1 or to
> run
> > multiple
> > simultaneous restore, move nodedata, or export node operations.
> >
> > Use Query NODEData to see how many tapes might have to be read for
> each
> > node to be
> > restored.
> >
> > To minimize tape mounts, if you can wait for this operation to
> complete,
> > I believe
> > you should try to move or export all of the nodes' data in a single
> > operation.
> >
> > Here are possible disadvantages with using MOVe NODEData:
> >   - does not enable you to select to move only the Active backups
for
> > these nodes
> >         [so you might have to move lots of extra inactive backups]
> >   - you probably can not effectively use MAXPROC=N (>1 nor run
> multiple
> > simultaneous
> >         MOVe NODEData commands because of contention for your
> > uncollocated volumes.
> >
> > If you have or can set up another TSM server, you could do a
> > Server-Server EXPort:
> >         EXPort Node node1,node2,... FILEData=BACKUPActive
TOServer=...
> > [Preview=Yes]
> > moving only the nodes' active backups to a diskpool on the other TSM
> > server.  Using
> > this technique, you can move only the minimal necessary data.  I
don't
> > see any way
> > to multithread or run multiple simultaneous commands to read more
than
> > one tape at
> > a time, but given your drive constraints and uncollocated volumes,
you
> > will probably
> > discover that you can not effectively restore, move, or export from
> more
> > than one tape
> > at a time, no matter which technique you try.  Your Query NODEData
> > output should show
> > you which nodes, if any, do *not* have backups on the same tapes.
> >
> > Try running a preview EXPort Node command for single or multiple
nodes
> > to get some
> > idea of what tapes will be mounted and how much data you will need
to
> > export.
> >
> > Call me if you want to talk about any of this.
> > --
> > Jim.Owen AT Yale DOT Edu   (w#203.432.6693, Verizon c#203.494.9201)
> >
> > Roger Deschner wrote:
> > > MOVE NODEDATA looks like it is going to be the key. I will simply
> move
> > > the affected nodes into a disk storage pool, or into our existing
> > > collocated tape storage pool. I presume it should be possible to
> > restart
> > > MOVE NODEDATA, in case it has to be interrupted or if the server
> > > crashes, because what it does is not very different from migration
> or
> > > relcamation. This should be a big advantage over GENERATE
BACKUPSET,
> > > which is not even as restartable as a common client restore. A
> > possible
> > > strategy is to do the long, laborious, but restartable, MOVE
> NODEDATA
> > > first, and then do a very quick, painless, regular client restore
or
> > > GENERATE BACKUPSET.
> > >
> > > Thanks to all! Until now, I was not fully aware of MOVE NODEDATA.
> > >
> > > B.T.W. It is an automatic tape library, Quantum P7000. We
graduated
> > from
> > > manual tape mounting back in 1999.
> > >
> > > Roger Deschner      University of Illinois at Chicago
> > rogerd AT uic DOT edu
> > >
> > >
> > > On Tue, 22 Jan 2008, Nicholas Cassimatis wrote:
> > >
> > >> Roger,
> > >>
> > >> If you know which nodes are to be restored, or at least have some
> > that are
> > >> good suspects, you might want to run some "move nodedata"
commands
> to
> > try
> > >> to get their data more contiguous.  If you can get some of that
> DASD
> > that's
> > >> coming "real soon," even just to borrow it, that would help out
> > >> tremendously.
> > >>
> > >> You say "tape" but never "library" - are you on manual drives?
> > (Please say
> > >> No, please say No...)  Try setting the mount retention high on
> them,
> > and
> > >> kick off a few restores at once.  You may get lucky and already
> have
> > the
> > >> needed tape mounted, saving you a few mounts.  If that's not
> working
> > (it's
> > >> impossible to predict which way it will go), drop the mount
> retention
> > to 0
> > >> so the tape ejects immediately, so the drive is ready for a new
> tape
> > >> sooner.  And if you are, try to recruit the people who haven't
> > approved
> > >> spending for the upgrades to be the "picker arm" for you - I did
> that
> > to an
> > >> account manager on a DR Test once, and we got the library
approved
> > the next
> > >> day.
> > >>
> > >> The thoughts of your fellow TSMers are with you.
> > >>
> > >> Nick Cassimatis
> > >>
> > >> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
> > 08:08 AM
> > >> -----
> > >>
> > >> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on
> 01/22/2008
> > >> 03:40:07 AM:
> > >>
> > >>> We like to talk about disaster preparedness, and one just
happened
> > here
> > >>> at UIC.
> > >>>
> > >>> On Saturday morning, a fire damaged portions of the UIC College
of
> > >>> Pharmacy Building. It affected several laboratories and offices.
> The
> > >>> Chicago Fire Department, wearing hazmat moon suits due to the
> highly
> > >>> dangerous contents of the laboratories, put it out efficiently
in
> > about
> > >>> 15 minutes. The temperature was around 0F (-18C), which
compounded
> > the
> > >>> problems - anything that took on water became a block of ice.
> > >>> Fortunately nobody was hurt; only a few people were in the
> building
> > on a
> > >>> Saturday morning, and they all got out safely.
> > >>>
> > >>> Now, both the good news and the bad news is that many of the
> damaged
> > >>> computers were backed up to our large TSM system. The good news
is
> > that
> > >>> their data can be restored.
> > >>>
> > >>> The bad news is that their data can be restored. And so now it
> must
> > be.
> > >>>
> > >>> Our TSM system is currently an old-school tape-based setup from
> the
> > ADSM
> > >>> days. (Upgrades involving a lot more disk coming real soon!)
Most
> of
> > the
> > >>> nodes affected are not collocated, so I have to plan to do a
> number
> > of
> > >>> full restores of nodes whose data is scattered across numerous
> tape
> > >>> volumes each. There are only 8 tape drives, and they are kept
busy
> > since
> > >>> this system is in a heavily-loaded, about-to-be-upgraded state.
> > (Timing
> > >>> couldn't be worse; Murphy's Law.)
> > >>>
> > >>> TSM was recently upgraded to version 5.5.0.0. It runs on AIX 5.3
> > with a
> > >>> SCSI library. Since it is a v5.5 server, there may be new
> facilities
> > >>> available that I'm not aware of yet.
> > >>>
> > >>> I have the luxury of a little bit of time in advance. The hazmat
> > guys
> > >>> aren't letting anyone in to asess damage yet, so we don't know
> which
> > >>> client node computers are damaged or not. We should know in a
day
> or
> > >>> two, so in the meantime I'm running as much reclamation as
> possible.
> > >>>
> > >>> Given that this is our situation, how can I best optimize these
> > >>> restores? I'm looking for ideas to get the most restoration done
> for
> > >>> this disaster, while still continuing normal client-backup,
> > migration,
> > >>> expiration, reclamation cycles, because somebody else unrelated
to
> > this
> > >>> situation could also need to restore...
> > >>>
> > >>> Roger Deschner      University of Illinois at Chicago
> > rogerd AT uic DOT edu
> >

<Prev in Thread] Current Thread [Next in Thread>