ADSM-L

Re: [ADSM-L] Remote tape drives

2008-11-17 15:11:27
Subject: Re: [ADSM-L] Remote tape drives
From: Wanda Prather <wprather AT JASI DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 17 Nov 2008 14:38:47 -0500
Paul, there is no difference in manual vs. auto reclaims.  But re-creating
copy pool volumes from primary volumes during reclaim is only done when the
copy pool volume is actually marked OFFSITE.  If it isn't, TSM knows it's
mountable, and therefore uses it to do the reclaim.

No reason I know of you can't have an auto script that does update vol *
wherestgpool=copypool access=offsite every day before you start your
reclaims.

naturally beliebes  that would be the quickest route to do a reclaim.

Wanda


On Mon, Nov 17, 2008 at 11:09 AM, Paul Zarnowski <psz1 AT cornell DOT edu> 
wrote:

> Allen,
>
> I also thank you for your description.  I was especially interested in the
> comments about how copy volumes are re-created from primary volumes when a
> copy pool volume is reclaimed.  The testing that I have done does not match
> what you indicated, which concerns me greatly, and I am trying to figure
> out why it is different.
>
> We are just in the process of creating copy volumes which we intend to
> relocate to an off-site library in a few months.  The off-site library (and
> drives) will be connected via FCIP, and I am concerned about total
> bandwidth requirements.  Re-creating copy volumes (during reclaim) from
> primary volumes would be much more bandwidth-friendly, and is the behavior
> that I am wanting.  However, when I just did a few manual reclaims (reclaim
> stgpool bk2.lde.st threshold=50), the resultant reclaim process went to
> mount volumes in the copy stgpool for both input and output.  Definitely
> not what I wanted to see.  Any ideas on why this is different?  Do you
> think a manual reclaim behaves differently than threshold-triggered
> automated reclaim?  (that would be wierd)  Have you confirmed the behavior
> you've documented lately?  We are running TSM 5.5.1.1.
>
> ..Paul
>
>
> At 09:51 AM 11/17/2008, Wanda Prather wrote:
>
>> Alan,
>>
>> That's the most coherent and complete discussion I've ever read about
>> managing remote volumes.  Thank you MUCH!!
>>
>> Wanda
>>
>> On Mon, Nov 10, 2008 at 10:07 AM, Allen S. Rout <asr AT ufl DOT edu> wrote:
>>
>> > >> On Sun, 9 Nov 2008 14:08:56 -0500, "Wanda Prather" <
>> wprather AT jasi DOT com>
>> > said:
>> >
>> > > When you are doing reclaims of the virtual volumes, doesn't the data
>> > > that is being reclaimed from the virtual tape have to travel back
>> > > across the network to the original TSM server's buffer, then out
>> > > across the network again to the new virtual volume?
>> >
>> > Short answer: "No".  The design is for a new copy volume to be created
>> > from primary volumes, and when all the data on a given to-be-reclaimed
>> > copy volume is present on newly built copy volumes, the old one goes
>> > pending.
>> >
>> > The answer gets longer, though.  "sometimes", the reclaiming server
>> > decides to read from a remote volume.  The Good Reason for this is
>> > when the primary volume is for some reason damaged or Unavailable.
>> > There are other times when the reclaiming server just gets an impulse
>> > and changes gears.  I had a few PMRs about this, and the response was
>> > somewhat opaque.  The conclusion I drew was "Oh, we found a couple of
>> > bits of bad logic, and we tuned it some".
>> >
>> > One interesting aspect of this is the changing locality of the offsite
>> > data.  When you make initial copies, your offiste data is grouped by
>> > time-of-backup.  When you reclaim that same data, the new offsite
>> > volumes are built by mounting one primary volume at a time, so the
>> > locality gradually comes to resemble that of the primary volumes.
>> > Collocated, perhaps?
>> >
>> > It's a side effect, but a pleasant trend.  I have often wished there
>> > were a collocation setting "Do what you can, but don't have a fit
>> > about it".
>> >
>> >
>> > > What has been your experience of managing that?  Do you just keep
>> > > the virtual volumes really small compared to physical media?
>> > > (assuming the target physical media is still something enormous like
>> > > LTO3) Or do you just resolve to have a really low utilization on the
>> > > target physical media?
>> >
>> > This is something I don't have a good theoretical answer for, yet.
>> > And boy howdy, I've tried.  Certainly, I waste more space in
>> > reclaimable blocks, because there are two levels, at least, of
>> > reclaiming going on: the physical remote volumes, and the virtual
>> > volumes within them.
>> >
>> > Here is the answer I have used, with no particular opinion that it's
>> > theoretically sound:
>> >
>> > + Most of my remote storage access is directly to the remote tapes.  I
>> >  have a few clients who have tighter bottlenecks and send them to
>> >  disk, but 'direct to tape' is the rule.  Note that this means when I
>> >  have >N streams trying to write, clients get in line and wait for a
>> >  drive, round-robin style.
>> >
>> > + I have some servers storing remote volumes of 50G MAXCAP, some of
>> >  20.  I haven't noted a big difference between them.  Biggest
>> >  theoretical basis for choosing I can come up with is the speed of
>> >  round-robin on access to the remote tapes.
>> >
>> > + My biggest pain in the patoot so far comes from individual files
>> >  that are much bigger than the remote volume size.  I hate re-sending
>> >  an initial chunk, then 4 intermediate volumes I know to be identical
>> >  to the remote volumes already present, and then re-sending the tail
>> >  chunk.
>> >
>> > + The other biggest pain in the patoot is that, while devices are
>> >  round-robining at the remote site, the source media is allocated at
>> >  the local site.  This means that you can deadlock your way into a
>> >  mess of 'no tapes available' if you get congested.
>> >
>> >  I find this to be a metastable situation: Things go very smoothly
>> >  until you hit some boundary condition, and then you have a
>> >  turbulence incident which takes intense, sustained effort to
>> >  resolve.
>> >
>> >
>> >
>> > > How do you know how big the "reallly big pipe" needs to be to take
>> > > care of the reclaims?
>> >
>> > This I -do- have a theoretical answer for.  See above when I talked
>> > about round-robin on the remote tape drives?  You want a pipe big
>> > enough to stream all the remote drives.  By implication, you can
>> > stream the same count of local drives; this means that, while you may
>> > have processes waiting in line for remote access, they won't be
>> > waiting for a network-constrained bottleneck.
>> >
>> > Of course, that's easier said than done: 3592E05s are theoretically
>> > capable of what, 200M/s?  I mean, that's what the brag sheet
>> > says... :) In realistic terms I get 60M sustained, 80-90M spikes.  In
>> > other realistic terms, you don't often have -everything- streaming at
>> > once.  So to calculate what your site would want, I suggest:
>> >
>> > + Get a Gb connection up.  Run one stream.  Optimize.  Measure
>> >  sustained bandwidth.
>> >
>> > + Multiply sustained bandwidth * number of remote drives.  Attempt to
>> >  get this size pipe.
>> >
>> > + Return to your cube, frustrated that they Just Don't Understand.  Be
>> >  happy you've got a Gb.  Work to fill it 24x7.
>> >
>> >
>> > Actually, I'm lucky: I've got budget to go to about 2G this fiscal
>> > year.  Woot!
>> >
>> >
>> >
>> > - Allen S. Rout
>> >
>>
>
>
> --
> Paul Zarnowski                            Ph: 607-255-4757
> Manager, Storage Services                 Fx: 607-255-8521
> 719 Rhodes Hall, Ithaca, NY 14853-3801    Em: psz1 AT cornell DOT edu
>

<Prev in Thread] Current Thread [Next in Thread>