ADSM-L

Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA

2014-06-10 14:11:55
Subject: Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA
From: Skylar Thompson <skylar2 AT U.WASHINGTON DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 10 Jun 2014 11:10:00 -0700
I had considered that, but the problem is that when a mount is blocked by a
request, the process is too, which could really slow things down if we
can't get to it immediately.

I've ended up making judicious use of simultaneous copy, which seems to
have helped this week. I've had to identify hosts that simply will never be
able to stream to tape and point those to a separate FILE pool that has
simultaneous copy disabled, but those are a fraction of our total backup
load so storage pool backups finished by the time the check out occurs.

I'm hopefuly that this will solve, or at least mitigate substantially, the
problem for the long-term.

Thanks all for the suggestions and discussion!

On Mon, Jun 09, 2014 at 06:41:45PM -0700, Alex Paschal wrote:
> Hi, Skylar.  Have you tried setting your MOUNTWAIT to 0 or 1?  It seems
> to me that should allow the operator request to time out and your
> processing to continue.
>
>
> On 6/3/2014 2:12 PM, Skylar Thompson wrote:
> > We've been suffering with the effects of this APAR for a while, which IBM
> > fixed as a documentation errata rather than fixing TSM itself:
> >
> > http://www-01.ibm.com/support/docview.wss?uid=swg1IC87352
> >
> > Basically the issue is that there is a race condition with running MOVE
> > DRMEDIA on tape volumes while BACKUP STGPOOL is also running. BACKUP
> > STGPOOL might choose a FILLING volume that MOVE DRMEDIA is also removing
> > from the library, which causes an operator request to be raised. We must
> > either check the volume back in, or cancel the request, allow TSM to mark
> > the volume UNAVAILABLE and then update the volume to be OFFSITE.
> >
> > We have some challenges in our TSM environment:
> >
> > 1. The data ingest is highly bursty - some days we might have 100GB in
> > backups, while others we might have 60TB. We average around 2TB/day in
> > additions to primary storage.
> >
> > 2. We are not staffed 24x7, so we can't have operator requests going off
> > outside business hours.
> >
> > 3. We have no dedicated staff managing our TSM/tape library environment, so
> > we prefer not getting any operator requests since we might not be able to
> > act on them immediately.
> >
> > 4. For budget and policy reasons, we have a weekly (not daily) shipment of
> > tape to our offsite vault.
> >
> > I've rejiggered our client and admin schedules, and reclamation to try to 
> > avoid
> > having writes into the copy pools happen while we do the checkout during
> > business hours, but it's quite difficult to actually quiesce everything.
> >
> > It seems like we have these options:
> >
> > 1. Just live with it as it is.
> >
> > 2. Don't run BACKUP STGPOOL on the day that the checkout will happen.
> >
> > 3. Automate checking for writes into copy pools and cancel the
> > session/process responsible for them. This might require restricting the
> > number of mounts in our tape device classes, and also seems like it has the
> > risk of being more disruptive than we really want.
> >
> > Have I missed anything? How are other people approaching this problem?
> >
> > Thanks,
> >
> > --
> > -- Skylar Thompson (skylar2 AT u.washington DOT edu)
> > -- Genome Sciences Department, System Administrator
> > -- Foege Building S046, (206)-685-7354
> > -- University of Washington School of Medicine
> >

--
-- Skylar Thompson (skylar2 AT u.washington DOT edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine