ADSM-L

Re: [ADSM-L] Remote tape drives

2008-11-17 09:53:05
Subject: Re: [ADSM-L] Remote tape drives
From: Wanda Prather <wprather AT JASI DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 17 Nov 2008 09:51:47 -0500
Alan,

That's the most coherent and complete discussion I've ever read about
managing remote volumes.  Thank you MUCH!!

Wanda

On Mon, Nov 10, 2008 at 10:07 AM, Allen S. Rout <asr AT ufl DOT edu> wrote:

> >> On Sun, 9 Nov 2008 14:08:56 -0500, "Wanda Prather" <wprather AT jasi DOT 
> >> com>
> said:
>
> > When you are doing reclaims of the virtual volumes, doesn't the data
> > that is being reclaimed from the virtual tape have to travel back
> > across the network to the original TSM server's buffer, then out
> > across the network again to the new virtual volume?
>
> Short answer: "No".  The design is for a new copy volume to be created
> from primary volumes, and when all the data on a given to-be-reclaimed
> copy volume is present on newly built copy volumes, the old one goes
> pending.
>
> The answer gets longer, though.  "sometimes", the reclaiming server
> decides to read from a remote volume.  The Good Reason for this is
> when the primary volume is for some reason damaged or Unavailable.
> There are other times when the reclaiming server just gets an impulse
> and changes gears.  I had a few PMRs about this, and the response was
> somewhat opaque.  The conclusion I drew was "Oh, we found a couple of
> bits of bad logic, and we tuned it some".
>
> One interesting aspect of this is the changing locality of the offsite
> data.  When you make initial copies, your offiste data is grouped by
> time-of-backup.  When you reclaim that same data, the new offsite
> volumes are built by mounting one primary volume at a time, so the
> locality gradually comes to resemble that of the primary volumes.
> Collocated, perhaps?
>
> It's a side effect, but a pleasant trend.  I have often wished there
> were a collocation setting "Do what you can, but don't have a fit
> about it".
>
>
> > What has been your experience of managing that?  Do you just keep
> > the virtual volumes really small compared to physical media?
> > (assuming the target physical media is still something enormous like
> > LTO3) Or do you just resolve to have a really low utilization on the
> > target physical media?
>
> This is something I don't have a good theoretical answer for, yet.
> And boy howdy, I've tried.  Certainly, I waste more space in
> reclaimable blocks, because there are two levels, at least, of
> reclaiming going on: the physical remote volumes, and the virtual
> volumes within them.
>
> Here is the answer I have used, with no particular opinion that it's
> theoretically sound:
>
> + Most of my remote storage access is directly to the remote tapes.  I
>  have a few clients who have tighter bottlenecks and send them to
>  disk, but 'direct to tape' is the rule.  Note that this means when I
>  have >N streams trying to write, clients get in line and wait for a
>  drive, round-robin style.
>
> + I have some servers storing remote volumes of 50G MAXCAP, some of
>  20.  I haven't noted a big difference between them.  Biggest
>  theoretical basis for choosing I can come up with is the speed of
>  round-robin on access to the remote tapes.
>
> + My biggest pain in the patoot so far comes from individual files
>  that are much bigger than the remote volume size.  I hate re-sending
>  an initial chunk, then 4 intermediate volumes I know to be identical
>  to the remote volumes already present, and then re-sending the tail
>  chunk.
>
> + The other biggest pain in the patoot is that, while devices are
>  round-robining at the remote site, the source media is allocated at
>  the local site.  This means that you can deadlock your way into a
>  mess of 'no tapes available' if you get congested.
>
>  I find this to be a metastable situation: Things go very smoothly
>  until you hit some boundary condition, and then you have a
>  turbulence incident which takes intense, sustained effort to
>  resolve.
>
>
>
> > How do you know how big the "reallly big pipe" needs to be to take
> > care of the reclaims?
>
> This I -do- have a theoretical answer for.  See above when I talked
> about round-robin on the remote tape drives?  You want a pipe big
> enough to stream all the remote drives.  By implication, you can
> stream the same count of local drives; this means that, while you may
> have processes waiting in line for remote access, they won't be
> waiting for a network-constrained bottleneck.
>
> Of course, that's easier said than done: 3592E05s are theoretically
> capable of what, 200M/s?  I mean, that's what the brag sheet
> says... :) In realistic terms I get 60M sustained, 80-90M spikes.  In
> other realistic terms, you don't often have -everything- streaming at
> once.  So to calculate what your site would want, I suggest:
>
> + Get a Gb connection up.  Run one stream.  Optimize.  Measure
>  sustained bandwidth.
>
> + Multiply sustained bandwidth * number of remote drives.  Attempt to
>  get this size pipe.
>
> + Return to your cube, frustrated that they Just Don't Understand.  Be
>  happy you've got a Gb.  Work to fill it 24x7.
>
>
> Actually, I'm lucky: I've got budget to go to about 2G this fiscal
> year.  Woot!
>
>
>
> - Allen S. Rout
>

<Prev in Thread] Current Thread [Next in Thread>