ADSM-L

Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA

2014-06-03 17:53:51
Subject: Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA
From: Skylar Thompson <skylar2 AT U.WASHINGTON DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 3 Jun 2014 14:51:50 -0700
Thanks for the reply, Wanda. Our RPO is a week (beauty of a research
environment), and we do two DB backups every day (one kept onsite, one kept
offsite), so whether we get today's or yesterday's DB backups in the MOVE
DRM isn't so important to us.

I considered doing a day-of-week test in the script, and doing MOVE DRM
after BACKUP DB/STGPOOL, but my worry is that on a slow day we'll end up
with it running before anyone shows up to empty out our I/O slots. That's
actually more of a mess than what we have now, since we could have 52 tapes
not yet in an OFFSITE state for several hours that some process or session
could end up requesting.

The UPDATE VOLUME trick might work, but I'd have to find a better way to
detect bad tape cartridges than using the READONLY state. We also track
media and drive faults, so it might be that we could just use those
notifications rather than the volume state.

Of course, in my ideal dream world, we'd just have a second tape
library in another location and do offsite backups directly over the wire...

Thanks again,

On Tue, Jun 03, 2014 at 09:43:32PM +0000, Prather, Wanda wrote:
> Well, that's a little confusing -
> Your BACKUP STGPOOLs must be completed and *then* a DB backup created before 
> you send them offsite, yes?  And if that DB backup doesn't go, then then copy 
> tapes are useless anyway?
> You can put WAIT=YES on your BACKUP STGPOOL cmd, and follow that in your 
> maintenance script by BACKUP DB WAIT=YES, and follow that by MOVE DRMEDIA 
> (you can use a conditional test for day of week in your maintenance script so 
> the MOVE DRMEDIA only happens on Checkout day.)  That way everything happens 
> without overlap.  I can send you an example of testing for day of week in the 
> maint script if you like.
>
> OTOH, assuming you are throwing the copy tapes off the island at a point in 
> time regardless of whether the backup stgpool has finished:
>
> I haven't seen this problem myself, so I don't know if this would help, but:  
> if you put in your maintenance script BEFORE the backup stgpool (again only 
> on checkout day):
>  update vol * wherestgpool=copypool wherestatus=filling access=readonly
>
> That would force the BACKUP STGPOOL process to always grab a scratch on that 
> day, rather than grabbing a FILLING tape.
> Would that jiggle things enough to avoid the timing problem?
>
> W
>
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of Skylar Thompson
> Sent: Tuesday, June 03, 2014 5:12 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA
>
> We've been suffering with the effects of this APAR for a while, which IBM 
> fixed as a documentation errata rather than fixing TSM itself:
>
> http://www-01.ibm.com/support/docview.wss?uid=swg1IC87352
>
> Basically the issue is that there is a race condition with running MOVE 
> DRMEDIA on tape volumes while BACKUP STGPOOL is also running. BACKUP STGPOOL 
> might choose a FILLING volume that MOVE DRMEDIA is also removing from the 
> library, which causes an operator request to be raised. We must either check 
> the volume back in, or cancel the request, allow TSM to mark the volume 
> UNAVAILABLE and then update the volume to be OFFSITE.
>
> We have some challenges in our TSM environment:
>
> 1. The data ingest is highly bursty - some days we might have 100GB in 
> backups, while others we might have 60TB. We average around 2TB/day in 
> additions to primary storage.
>
> 2. We are not staffed 24x7, so we can't have operator requests going off 
> outside business hours.
>
> 3. We have no dedicated staff managing our TSM/tape library environment, so 
> we prefer not getting any operator requests since we might not be able to act 
> on them immediately.
>
> 4. For budget and policy reasons, we have a weekly (not daily) shipment of 
> tape to our offsite vault.
>
> I've rejiggered our client and admin schedules, and reclamation to try to 
> avoid having writes into the copy pools happen while we do the checkout 
> during business hours, but it's quite difficult to actually quiesce 
> everything.
>
> It seems like we have these options:
>
> 1. Just live with it as it is.
>
> 2. Don't run BACKUP STGPOOL on the day that the checkout will happen.
>
> 3. Automate checking for writes into copy pools and cancel the 
> session/process responsible for them. This might require restricting the 
> number of mounts in our tape device classes, and also seems like it has the 
> risk of being more disruptive than we really want.
>
> Have I missed anything? How are other people approaching this problem?
>
> Thanks,
>
> --
> -- Skylar Thompson (skylar2 AT u.washington DOT edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine

--
-- Skylar Thompson (skylar2 AT u.washington DOT edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine