ADSM-L

Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA

2014-06-04 09:38:37
Subject: Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA
From: Rick Adamson <RickAdamson AT BILOHOLDINGS DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 4 Jun 2014 13:36:38 +0000
Skylar,

Rather than use the TSM "maintenance" script I chose to create my own.

I added all the processes to one script, with the WAIT=YES and Serial/Parallel 
options was able to schedule these processes so they don't overlap.

For example; disk pool migration runs in parallel, then expiration runs serial, 
the pools are backed up in parallel followed by reclaims, switch to serial for 
a DB backup and prepare. Each process group waits on the previous one to 
complete before starting.

If needed add appropriate trigger for an occasional incremental DB backup.

As far as the "Move DRM" task, all my systems are daily but why not only run it 
on the day you physically transport tapes, after all if they are remaining 
onsite for days does it really matter if they are moved from mountable to 
vault, or ejected from the library on days they aren't being sent off site? The 
risk is the same because the tape is still there.

In the end you may even use less tapes because day after day the system can 
continue to use filling tapes until the day the physically move.

Just my thoughts, hope it helps....

-Rick Adamson

   


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Skylar Thompson
Sent: Tuesday, June 03, 2014 5:52 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA

Thanks for the reply, Wanda. Our RPO is a week (beauty of a research 
environment), and we do two DB backups every day (one kept onsite, one kept 
offsite), so whether we get today's or yesterday's DB backups in the MOVE DRM 
isn't so important to us.

I considered doing a day-of-week test in the script, and doing MOVE DRM after 
BACKUP DB/STGPOOL, but my worry is that on a slow day we'll end up with it 
running before anyone shows up to empty out our I/O slots. That's actually more 
of a mess than what we have now, since we could have 52 tapes not yet in an 
OFFSITE state for several hours that some process or session could end up 
requesting.

The UPDATE VOLUME trick might work, but I'd have to find a better way to detect 
bad tape cartridges than using the READONLY state. We also track media and 
drive faults, so it might be that we could just use those notifications rather 
than the volume state.

Of course, in my ideal dream world, we'd just have a second tape library in 
another location and do offsite backups directly over the wire...

Thanks again,

On Tue, Jun 03, 2014 at 09:43:32PM +0000, Prather, Wanda wrote:
> Well, that's a little confusing -
> Your BACKUP STGPOOLs must be completed and *then* a DB backup created before 
> you send them offsite, yes?  And if that DB backup doesn't go, then then copy 
> tapes are useless anyway?
> You can put WAIT=YES on your BACKUP STGPOOL cmd, and follow that in your 
> maintenance script by BACKUP DB WAIT=YES, and follow that by MOVE DRMEDIA 
> (you can use a conditional test for day of week in your maintenance script so 
> the MOVE DRMEDIA only happens on Checkout day.)  That way everything happens 
> without overlap.  I can send you an example of testing for day of week in the 
> maint script if you like.
>
> OTOH, assuming you are throwing the copy tapes off the island at a point in 
> time regardless of whether the backup stgpool has finished:
>
> I haven't seen this problem myself, so I don't know if this would help, but:  
> if you put in your maintenance script BEFORE the backup stgpool (again only 
> on checkout day):
>  update vol * wherestgpool=copypool wherestatus=filling 
> access=readonly
>
> That would force the BACKUP STGPOOL process to always grab a scratch on that 
> day, rather than grabbing a FILLING tape.
> Would that jiggle things enough to avoid the timing problem?
>
> W
>
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of Skylar Thompson
> Sent: Tuesday, June 03, 2014 5:12 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] Serializing BACKUP STGPOOL / MOVE DRMEDIA
>
> We've been suffering with the effects of this APAR for a while, which IBM 
> fixed as a documentation errata rather than fixing TSM itself:
>
> http://www-01.ibm.com/support/docview.wss?uid=swg1IC87352
>
> Basically the issue is that there is a race condition with running MOVE 
> DRMEDIA on tape volumes while BACKUP STGPOOL is also running. BACKUP STGPOOL 
> might choose a FILLING volume that MOVE DRMEDIA is also removing from the 
> library, which causes an operator request to be raised. We must either check 
> the volume back in, or cancel the request, allow TSM to mark the volume 
> UNAVAILABLE and then update the volume to be OFFSITE.
>
> We have some challenges in our TSM environment:
>
> 1. The data ingest is highly bursty - some days we might have 100GB in 
> backups, while others we might have 60TB. We average around 2TB/day in 
> additions to primary storage.
>
> 2. We are not staffed 24x7, so we can't have operator requests going off 
> outside business hours.
>
> 3. We have no dedicated staff managing our TSM/tape library environment, so 
> we prefer not getting any operator requests since we might not be able to act 
> on them immediately.
>
> 4. For budget and policy reasons, we have a weekly (not daily) shipment of 
> tape to our offsite vault.
>
> I've rejiggered our client and admin schedules, and reclamation to try to 
> avoid having writes into the copy pools happen while we do the checkout 
> during business hours, but it's quite difficult to actually quiesce 
> everything.
>
> It seems like we have these options:
>
> 1. Just live with it as it is.
>
> 2. Don't run BACKUP STGPOOL on the day that the checkout will happen.
>
> 3. Automate checking for writes into copy pools and cancel the 
> session/process responsible for them. This might require restricting the 
> number of mounts in our tape device classes, and also seems like it has the 
> risk of being more disruptive than we really want.
>
> Have I missed anything? How are other people approaching this problem?
>
> Thanks,
>
> --
> -- Skylar Thompson (skylar2 AT u.washington DOT edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine

--
-- Skylar Thompson (skylar2 AT u.washington DOT edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine