Reclamation unable to fully reclaim...

foobar2devnull

ADSM.ORG Member
Joined
Nov 30, 2010
Messages
122
Reaction score
1
Points
0
Location
Belgium
Sorry for the long post but I think the info is needed.

Following the migration of our server to a new physical server, I believe errors where made and the data on disk (Primary file pool) does not hold all the data that can be found on the copy tape pool. This is causing a world of problems with reclamation. I currently have 200 tapes over my threshold of 70%. I am going to put as much information as I can bellow in the hope that someone might have a clue on the next step I should tape. All I can do these days is bring the tapes with error ANR1163W back on site and perpetuate the problem.

This is a description of my setup and the issue:


We have a retention of 31 days on all backups.


PT01 = Primary Tape Pool
CT01 = Copy Tape Pool
PF01 = Primary File Pool


tsm: TSMINST1>q stg

Storage Device Estimated Pct Pct High Low Next Stora-
Pool Name Class Name Capacity Util Migr Mig Mig ge Pool
Pct Pct
----------- ---------- ---------- ----- ----- ---- --- -----------
CT01 LTO5 3,264,725 3.6
G
PF01 FILE 172,090 G 39.2 39.2 90 89 PT01
PT01 LTO5 333,200 G 3.1 6.0 90 70

We backup to PF01 and make a copy to CT01. PT01 is used as a next stg pool and for the archives.


I have the following command run each day. I had to add "OFFSITERECLAIML" once I started having the problem or reclamation would never finish.

RECLAIM STGPOOL CT01 THRESHOLD=70 OFFSITERECLAIML=5 WAIT=YES​

I have tried rising the THRESHOLD to get rid of all the empty tapes (as follows) but to no avail.


tsm: TSMINST1>select cast(VOLUME_NAME as char(8)) as Tape, PCT_UTILIZED, PCT_RECLAIM from volumes where stgpool_name = 'CT01' and PCT_RECLAIM >= 70 order by PCT_RECLAIM desc

TAPE PCT_UTILIZED PCT_RECLAIM
--------- ------------- ------------
AA0017L6 0.0 100.0
AA0140L5 0.0 100.0
AA0170L5 0.0 100.0
AA0209L5 0.0 100.0
AA0213L5 0.0 100.0
AA0241L5 0.0 100.0
AA0081L6 0.0 100.0
AA0249L5 0.0 100.0
AA0256L5 0.0 100.0
AA0278L5 0.0 100.0
AA0300L5 0.0 100.0
AA0360L5 0.0 100.0
AA0361L5 0.0 100.0
AA0048L6 0.6 99.7
AA0167L5 0.6 99.7
AA0070L6 0.3 99.7
AA0107L5 0.8 99.6
[...]

tsm: TSMINST1>select count(*) as "reclaimable tapes" from volumes where stgpool_name = 'CT01' and PCT_RECLAIM >= 70

reclaimable tapes
------------------
200

I keep getting error ANR1163W.

Looking back over 10 days of reclamation, the following tapes are requested:

$ cat /tmp/ANR1163W_10 |grep ANR1163W|awk '{print $6}'|sort|grep ..|uniq -c
1 AA0005L6
1 AA0011L6
9 AA0017L6
1 AA0051L6
1 AA0081L6
1 AA0156L5
4 AA0209L5
9 AA0213L5
9 AA0361L5​

When I look at what they contain, I get:

tsm: TSMINST1>q vol AA0017L6

Volume Name Storage Device Estimated Pct Volume
Pool Name Class Name Capacity Util Status
------------------------ ----------- ---------- --------- ----- --------
AA0017L6 CT01 LTO5 5.7 T 0.0 Filling

tsm: TSMINST1>q vol AA0051L6

Volume Name Storage Device Estimated Pct Volume
Pool Name Class Name Capacity Util Status
------------------------ ----------- ---------- --------- ----- --------
AA0051L6 CT01 LTO5 0.0 M 0.0 Empty

tsm: TSMINST1>q vol AA0081L6

Volume Name Storage Device Estimated Pct Volume
Pool Name Class Name Capacity Util Status
------------------------ ----------- ---------- --------- ----- --------
AA0081L6 CT01 LTO5 5.7 T 0.0 Filling

tsm: TSMINST1>q vol AA0156L5

Volume Name Storage Device Estimated Pct Volume
Pool Name Class Name Capacity Util Status
------------------------ ----------- ---------- --------- ----- --------
AA0156L5 CT01 LTO5 0.0 M 0.0 Pending

[...]​


If I have a closer look, the tapes do contain data, even if it's just a couple of files.

I have no volumes in error be it file or tape.

My guess is that there are files in the CT01 pool that do not exist in the PF01 pool due to the migration and thus, reclamation can not happen until I bring back the needed tapes on site for reclamation.

Any help would be fab, I've been trying so many different strategies to solve this but I have no idea what to do next short of scratching the copy pool.
 
I keep getting error ANR1163W.
Check this: http://www-01.ibm.com/support/docview.wss?uid=swg21314202
You will need to fix the problem so that you can run reclamation can reclaim that space too.

My guess is that there are files in the CT01 pool that do not exist in the PF01 pool due to the migration and thus, reclamation can not happen until I bring back the needed tapes on site for reclamation.
That's not really how offsite reclamation work. It doesn't matter if the file(s) are in the same primary pool as when the backup stgpool was taken, it's not even tracked where that copy came from. So when offsite reclamation, it grabs the primary copy from where ever it exists in a primary pool, the same way that it also works if a client does a restore, TSM grabs it from where it is at that instant. You need to address ANR1163W instead.
 
Thanks for your help marclant,
I will go over the document and see if I can solve the issue. If not I'll be back! ;)

That's not really how offsite reclamation work. It doesn't matter if the file(s) are in the same primary pool as when the backup stgpool was taken, it's not even tracked where that copy came from. So when offsite reclamation, it grabs the primary copy from where ever it exists in a primary pool, the same way that it also works if a client does a restore, TSM grabs it from where it is at that instant. You need to address ANR1163W instead.

What I meant was that our primary pool might not have been properly migrated and therefore some files exist on a copy pool that do not exist in the primary pool. This would explain why the tapes have to be on site for reclamation and why I get the Warnings. If I could "restore" those files to the primary pool, the problem would be fixed (assuming I am correct of course).
 
What I meant was that our primary pool might not have been properly migrated and therefore some files exist on a copy pool that do not exist in the primary pool.
The only this would be possible is if the primary copy got damaged, which is one thing that technote makes you check. Because, that primary copy had to exist when the offsite copy was made. Migration won't get rid of the primary copy, it will just relocate it to different volume in a different storage pool, but should still be available, unless the object is damaged on the volume or the volume is marked destroyed.
 
The only this would be possible is if the primary copy got damaged, which is one thing that technote makes you check. Because, that primary copy had to exist when the offsite copy was made. Migration won't get rid of the primary copy, it will just relocate it to different volume in a different storage pool, but should still be available, unless the object is damaged on the volume or the volume is marked destroyed.

Got it. I'm working through the list. Thanks a lot for your help.
 
Back
Top