Is reclamation going mad

sfb

ADSM.ORG Member
Joined
Nov 10, 2008
Messages
51
Reaction score
0
Points
0
Hi,

I have a question:
Our scratch count keeps going up. We are recalling more tapes than we are sending offsite.

I have attached a chart and as you can see, we've been around the 100-120 mark but for no apparent reason, about 2 weeks ago, it started shooting through the roof and has been steadily going up since then.

Our storage pool size on primary storage pool has been steady (i have done a comparison and local pools and offsite pools are a perfect match).
We deleted a few backups and archive before it all started happening but we're talking only a few tapes worth in terms of size.

I have run a query agains pct_utilized of offsite volumes and about 3/4 (150) of the offsite volumes show pct_utilised = 0.0. What is that about/could that be the reason we are recalling so many volumes?

Your help much appreciated.

Thank you

Sylvain
 

Attachments

  • Scratch.JPG
    Scratch.JPG
    43.9 KB · Views: 27
We deleted a few backups and archive before it all started happening but we're talking only a few tapes worth in terms of size.

Hi,
How you deleted those backups/archives? Did you deleted all versions?
Are your storage pools collocated?

Is there any other change in your node clients in terms of storage usage? This can be pass inadvertely for the sysadmin.

Rudy
 
Any changes to your copygroups/policy domains? If somebody has deliberately or accidentally changed the retention settings on your copygroups this would see more tapes coming back.
 
Thanks for your replies guys.

No changes have been made to the environement.
I have just checked copy group, pol and stg and all look fine. in fact nothing changed recently.

Collocation is off and it has always been

Reclamation is part of a script that runs every morning and we check it every day, very very very rarely does it fails, so i can only assume that reclamation has always run the way it should. Below is the reclamation section out of our morning script:
PARALLEL
reclaim stgpool tape_backup thr=50 duration=60 w=y
reclaim stgpool tape_archive thr=50 duration=60 w=y
reclaim stgpool tape_dir thr=50 duration=60 w=y
reclaim stgpool tape_sqlarchive thr=50 duration=60 w=y
SERIAL
PARALLEL
reclaim stgpool offsite_tape thr=50 offsitereclaimlimit=2 w=y
reclaim stgpool offsite_archive thr=50 offsitereclaimlimit=2 w=y


We deleted a couple of servers' backups a few days before all this started happening(all versions - max:28 versions) and a few archives - Could there be thousands of files to expire scattered on hundreds of offsite tapes taking all this time and causing all these volumes to be reclaimed?

Still don't undersand all these offsite volumes that have a pct_ulilized of 0.0? (Can someone explain if this is normal? and why this might be?)

I have done so much investigationg, I cant see anything wrong.

Thanks very much for your help.:up:
 
Volume with %util of 0.0% is either a rounding error, or there is part of a file on the tape, with the other part of the file on another tape. Or slight db inconsistency.

If q content <vol> shows nothing, then there is either just a part file on there, or sometimes I have seen "audit vol <vol>" fix it (and it didn't need to physically access the tape either.
 
Thanks bbb.
I have checked a couple of these tapes and they do contain data (a very minute amount though)... Shouldn't reclamation regroup all that data onto one tape? And if it should, how can reclamation let so many tapes contain such a small amount of data?
 
It should. But if the primary tape is unavailable, or has read errors on that file, it can't reclaim it. Try "move data <volname> recons=yes" on the vol you want reclaimed, that is equiv to reclamation and you'll see errors showing what the prob is.
 
Thanks. might give that a go, but that mean more tapes coming back.

In any case, i can't think why we are recalling so many media.

Everything looks fine though!

Anyway, thank you guys for your help.:up:
 
Back
Top