Is reclamation going mad

sfb · Feb 23, 2009

Hi,

I have a question:
Our scratch count keeps going up. We are recalling more tapes than we are sending offsite.

I have attached a chart and as you can see, we've been around the 100-120 mark but for no apparent reason, about 2 weeks ago, it started shooting through the roof and has been steadily going up since then.

Our storage pool size on primary storage pool has been steady (i have done a comparison and local pools and offsite pools are a perfect match).
We deleted a few backups and archive before it all started happening but we're talking only a few tapes worth in terms of size.

I have run a query agains pct_utilized of offsite volumes and about 3/4 (150) of the offsite volumes show pct_utilised = 0.0. What is that about/could that be the reason we are recalling so many volumes?

Your help much appreciated.

Thank you

Sylvain

rore · Feb 23, 2009

sfb said:
We deleted a few backups and archive before it all started happening but we're talking only a few tapes worth in terms of size.

Hi,
How you deleted those backups/archives? Did you deleted all versions?
Are your storage pools collocated?

Is there any other change in your node clients in terms of storage usage? This can be pass inadvertely for the sysadmin.

Rudy

BBB · Feb 23, 2009

Any changes to your copygroups/policy domains? If somebody has deliberately or accidentally changed the retention settings on your copygroups this would see more tapes coming back.

sfb · Feb 24, 2009

Thanks for your replies guys.

No changes have been made to the environement.
I have just checked copy group, pol and stg and all look fine. in fact nothing changed recently.

Collocation is off and it has always been

Reclamation is part of a script that runs every morning and we check it every day, very very very rarely does it fails, so i can only assume that reclamation has always run the way it should. Below is the reclamation section out of our morning script:
PARALLEL
reclaim stgpool tape_backup thr=50 duration=60 w=y
reclaim stgpool tape_archive thr=50 duration=60 w=y
reclaim stgpool tape_dir thr=50 duration=60 w=y
reclaim stgpool tape_sqlarchive thr=50 duration=60 w=y
SERIAL
PARALLEL
reclaim stgpool offsite_tape thr=50 offsitereclaimlimit=2 w=y
reclaim stgpool offsite_archive thr=50 offsitereclaimlimit=2 w=y

We deleted a couple of servers' backups a few days before all this started happening(all versions - max:28 versions) and a few archives - Could there be thousands of files to expire scattered on hundreds of offsite tapes taking all this time and causing all these volumes to be reclaimed?

Still don't undersand all these offsite volumes that have a pct_ulilized of 0.0? (Can someone explain if this is normal? and why this might be?)

I have done so much investigationg, I cant see anything wrong.

Thanks very much for your help.:up:

BBB · Feb 24, 2009

Volume with %util of 0.0% is either a rounding error, or there is part of a file on the tape, with the other part of the file on another tape. Or slight db inconsistency.

If q content <vol> shows nothing, then there is either just a part file on there, or sometimes I have seen "audit vol <vol>" fix it (and it didn't need to physically access the tape either.

sfb · Feb 24, 2009

Thanks bbb.
I have checked a couple of these tapes and they do contain data (a very minute amount though)... Shouldn't reclamation regroup all that data onto one tape? And if it should, how can reclamation let so many tapes contain such a small amount of data?

BBB · Feb 24, 2009

It should. But if the primary tape is unavailable, or has read errors on that file, it can't reclaim it. Try "move data <volname> recons=yes" on the vol you want reclaimed, that is equiv to reclamation and you'll see errors showing what the prob is.

sfb · Feb 25, 2009

Thanks. might give that a go, but that mean more tapes coming back.

In any case, i can't think why we are recalling so many media.

Everything looks fine though!

Anyway, thank you guys for your help.:up:

Is reclamation going mad

sfb

Attachments

rore

BBB

sfb

BBB

sfb

BBB

sfb

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics