ADSM-L

Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?

2008-01-22 14:00:30
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 22 Jan 2008 13:58:11 -0500
Are files that are no longer active automatically expired from the
activedata pool when you perform the latest COPY ACTIVEDATA?  This would
mean that, at some point, you would need to do reclamation on this pool,
right?

It would seem to me that this would be a much better answer to TOP's
question.  Instead of doing a MOVE NODE (which requires moving ALL of
the node's files), or doing an EXPORT NODE (which requires a separate
server), he can just create an ACTIVEDATA pool, then perform a COPY
ACTIVEDATA into it while he's preparing for the restore.  Putting said
pool on disk would be even better, of course.

I was just discussing this with another one of our TSM experts, and he's
not as bullish on it as I am.  (It was an off-list convo, so I'll let
him go nameless unless he wants to speak up.)  He doesn't like that you
can't use a DISK type device class (disk has to be listed as FILE type).

He also has issues with the resources needed to create this "3rd copy"
of the data.  He said, "Most customers have trouble getting backups
complete and creating their offsite copies in a 24 hour period and would
not be able to complete a third copy of the data."  Add to that the
possibility of doing reclamation on this pool and you've got even more
work to do.

He's more of a fan of group collocation and the multisession restore
feature.  I think this has more value if you're restoring fewer clients
than you have tape drives.  Because if you collocate all your active
files, then you'll only be using one tape drive per client.  If you've
got 40 clients to restore and 20 tape drives, I don't see this slowing
you down.  But if you've got one client to restore, and 20 tape drives,
then the multisession restore would probably be faster than a collocated
restore.

I still think it's a strong feature whose value should be investigated
and discussed -- even if you only use it for the purpose we're
discussing here.  If you know you're in a DR scenario and you're going
to be restoring multiple systems, why wouldn't you do create an
ACTIVEDATA pool and do a COPY ACTIVEDATA instead of a MOVE NODE?

OK, here's another question.  Is it assumed that the ACTIVEDATA pool
have node-level collocation on?  Can you use group collocation instead?
Then maybe I and my friend could both get what we want?

Just throwing thoughts out there.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Maria Ilieva
Sent: Tuesday, January 22, 2008 10:22 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?

The procedure of creating active data pools (assuming you have TSM
version 5.4 or more) is the following:
1. Create FILE type DISK pool or sequential TAPE pool specifying
pooltype=ACTIVEDATA
2.Update node's domain(s) specifying ACTIVEDESTINATION=<created active
data pool>
3. Issue COPY ACTIVEDATA <node_name>
This process incrementaly copies node's active data, so it can be
restarted if needed. HSM migrated and archived data is not copied in
the active data pool!

Maria Ilieva

> ---
> W. Curtis Preston
> Backup Blog @ www.backupcentral.com
> VP Data Protection, GlassHouse Technologies
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of
> James R Owen
> Sent: Tuesday, January 22, 2008 9:32 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] Fw: DISASTER: How to do a LOT of restores?
>
>
> Roger,
> You certainly want to get a "best guess" list of likely priority#1
> restores.
> If your tapes really are mostly uncollocated, you will probably
> experience lots of
> tape volume contention when you attempt to use MAXPRocess > 1 or to
run
> multiple
> simultaneous restore, move nodedata, or export node operations.
>
> Use Query NODEData to see how many tapes might have to be read for
each
> node to be
> restored.
>
> To minimize tape mounts, if you can wait for this operation to
complete,
> I believe
> you should try to move or export all of the nodes' data in a single
> operation.
>
> Here are possible disadvantages with using MOVe NODEData:
>   - does not enable you to select to move only the Active backups for
> these nodes
>         [so you might have to move lots of extra inactive backups]
>   - you probably can not effectively use MAXPROC=N (>1 nor run
multiple
> simultaneous
>         MOVe NODEData commands because of contention for your
> uncollocated volumes.
>
> If you have or can set up another TSM server, you could do a
> Server-Server EXPort:
>         EXPort Node node1,node2,... FILEData=BACKUPActive TOServer=...
> [Preview=Yes]
> moving only the nodes' active backups to a diskpool on the other TSM
> server.  Using
> this technique, you can move only the minimal necessary data.  I don't
> see any way
> to multithread or run multiple simultaneous commands to read more than
> one tape at
> a time, but given your drive constraints and uncollocated volumes, you
> will probably
> discover that you can not effectively restore, move, or export from
more
> than one tape
> at a time, no matter which technique you try.  Your Query NODEData
> output should show
> you which nodes, if any, do *not* have backups on the same tapes.
>
> Try running a preview EXPort Node command for single or multiple nodes
> to get some
> idea of what tapes will be mounted and how much data you will need to
> export.
>
> Call me if you want to talk about any of this.
> --
> Jim.Owen AT Yale DOT Edu   (w#203.432.6693, Verizon c#203.494.9201)
>
> Roger Deschner wrote:
> > MOVE NODEDATA looks like it is going to be the key. I will simply
move
> > the affected nodes into a disk storage pool, or into our existing
> > collocated tape storage pool. I presume it should be possible to
> restart
> > MOVE NODEDATA, in case it has to be interrupted or if the server
> > crashes, because what it does is not very different from migration
or
> > relcamation. This should be a big advantage over GENERATE BACKUPSET,
> > which is not even as restartable as a common client restore. A
> possible
> > strategy is to do the long, laborious, but restartable, MOVE
NODEDATA
> > first, and then do a very quick, painless, regular client restore or
> > GENERATE BACKUPSET.
> >
> > Thanks to all! Until now, I was not fully aware of MOVE NODEDATA.
> >
> > B.T.W. It is an automatic tape library, Quantum P7000. We graduated
> from
> > manual tape mounting back in 1999.
> >
> > Roger Deschner      University of Illinois at Chicago
> rogerd AT uic DOT edu
> >
> >
> > On Tue, 22 Jan 2008, Nicholas Cassimatis wrote:
> >
> >> Roger,
> >>
> >> If you know which nodes are to be restored, or at least have some
> that are
> >> good suspects, you might want to run some "move nodedata" commands
to
> try
> >> to get their data more contiguous.  If you can get some of that
DASD
> that's
> >> coming "real soon," even just to borrow it, that would help out
> >> tremendously.
> >>
> >> You say "tape" but never "library" - are you on manual drives?
> (Please say
> >> No, please say No...)  Try setting the mount retention high on
them,
> and
> >> kick off a few restores at once.  You may get lucky and already
have
> the
> >> needed tape mounted, saving you a few mounts.  If that's not
working
> (it's
> >> impossible to predict which way it will go), drop the mount
retention
> to 0
> >> so the tape ejects immediately, so the drive is ready for a new
tape
> >> sooner.  And if you are, try to recruit the people who haven't
> approved
> >> spending for the upgrades to be the "picker arm" for you - I did
that
> to an
> >> account manager on a DR Test once, and we got the library approved
> the next
> >> day.
> >>
> >> The thoughts of your fellow TSMers are with you.
> >>
> >> Nick Cassimatis
> >>
> >> ----- Forwarded by Nicholas Cassimatis/Raleigh/IBM on 01/22/2008
> 08:08 AM
> >> -----
> >>
> >> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on
01/22/2008
> >> 03:40:07 AM:
> >>
> >>> We like to talk about disaster preparedness, and one just happened
> here
> >>> at UIC.
> >>>
> >>> On Saturday morning, a fire damaged portions of the UIC College of
> >>> Pharmacy Building. It affected several laboratories and offices.
The
> >>> Chicago Fire Department, wearing hazmat moon suits due to the
highly
> >>> dangerous contents of the laboratories, put it out efficiently in
> about
> >>> 15 minutes. The temperature was around 0F (-18C), which compounded
> the
> >>> problems - anything that took on water became a block of ice.
> >>> Fortunately nobody was hurt; only a few people were in the
building
> on a
> >>> Saturday morning, and they all got out safely.
> >>>
> >>> Now, both the good news and the bad news is that many of the
damaged
> >>> computers were backed up to our large TSM system. The good news is
> that
> >>> their data can be restored.
> >>>
> >>> The bad news is that their data can be restored. And so now it
must
> be.
> >>>
> >>> Our TSM system is currently an old-school tape-based setup from
the
> ADSM
> >>> days. (Upgrades involving a lot more disk coming real soon!) Most
of
> the
> >>> nodes affected are not collocated, so I have to plan to do a
number
> of
> >>> full restores of nodes whose data is scattered across numerous
tape
> >>> volumes each. There are only 8 tape drives, and they are kept busy
> since
> >>> this system is in a heavily-loaded, about-to-be-upgraded state.
> (Timing
> >>> couldn't be worse; Murphy's Law.)
> >>>
> >>> TSM was recently upgraded to version 5.5.0.0. It runs on AIX 5.3
> with a
> >>> SCSI library. Since it is a v5.5 server, there may be new
facilities
> >>> available that I'm not aware of yet.
> >>>
> >>> I have the luxury of a little bit of time in advance. The hazmat
> guys
> >>> aren't letting anyone in to asess damage yet, so we don't know
which
> >>> client node computers are damaged or not. We should know in a day
or
> >>> two, so in the meantime I'm running as much reclamation as
possible.
> >>>
> >>> Given that this is our situation, how can I best optimize these
> >>> restores? I'm looking for ideas to get the most restoration done
for
> >>> this disaster, while still continuing normal client-backup,
> migration,
> >>> expiration, reclamation cycles, because somebody else unrelated to
> this
> >>> situation could also need to restore...
> >>>
> >>> Roger Deschner      University of Illinois at Chicago
> rogerd AT uic DOT edu
>