ADSM-L

Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA

2014-06-03 17:45:13
Subject: Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA
From: "Prather, Wanda" <Wanda.Prather AT ICFI DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 3 Jun 2014 21:43:32 +0000
Well, that's a little confusing - 
Your BACKUP STGPOOLs must be completed and *then* a DB backup created before 
you send them offsite, yes?  And if that DB backup doesn't go, then then copy 
tapes are useless anyway?
You can put WAIT=YES on your BACKUP STGPOOL cmd, and follow that in your 
maintenance script by BACKUP DB WAIT=YES, and follow that by MOVE DRMEDIA (you 
can use a conditional test for day of week in your maintenance script so the 
MOVE DRMEDIA only happens on Checkout day.)  That way everything happens 
without overlap.  I can send you an example of testing for day of week in the 
maint script if you like.

OTOH, assuming you are throwing the copy tapes off the island at a point in 
time regardless of whether the backup stgpool has finished:

I haven't seen this problem myself, so I don't know if this would help, but:  
if you put in your maintenance script BEFORE the backup stgpool (again only on 
checkout day):  
 update vol * wherestgpool=copypool wherestatus=filling access=readonly

That would force the BACKUP STGPOOL process to always grab a scratch on that 
day, rather than grabbing a FILLING tape.
Would that jiggle things enough to avoid the timing problem?

W  



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Skylar Thompson
Sent: Tuesday, June 03, 2014 5:12 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA

We've been suffering with the effects of this APAR for a while, which IBM fixed 
as a documentation errata rather than fixing TSM itself:

http://www-01.ibm.com/support/docview.wss?uid=swg1IC87352

Basically the issue is that there is a race condition with running MOVE DRMEDIA 
on tape volumes while BACKUP STGPOOL is also running. BACKUP STGPOOL might 
choose a FILLING volume that MOVE DRMEDIA is also removing from the library, 
which causes an operator request to be raised. We must either check the volume 
back in, or cancel the request, allow TSM to mark the volume UNAVAILABLE and 
then update the volume to be OFFSITE.

We have some challenges in our TSM environment:

1. The data ingest is highly bursty - some days we might have 100GB in backups, 
while others we might have 60TB. We average around 2TB/day in additions to 
primary storage.

2. We are not staffed 24x7, so we can't have operator requests going off 
outside business hours.

3. We have no dedicated staff managing our TSM/tape library environment, so we 
prefer not getting any operator requests since we might not be able to act on 
them immediately.

4. For budget and policy reasons, we have a weekly (not daily) shipment of tape 
to our offsite vault.

I've rejiggered our client and admin schedules, and reclamation to try to avoid 
having writes into the copy pools happen while we do the checkout during 
business hours, but it's quite difficult to actually quiesce everything.

It seems like we have these options:

1. Just live with it as it is.

2. Don't run BACKUP STGPOOL on the day that the checkout will happen.

3. Automate checking for writes into copy pools and cancel the session/process 
responsible for them. This might require restricting the number of mounts in 
our tape device classes, and also seems like it has the risk of being more 
disruptive than we really want.

Have I missed anything? How are other people approaching this problem?

Thanks,

--
-- Skylar Thompson (skylar2 AT u.washington DOT edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine