Primary Disk Storage Pools Full/Critical status

WalterITD

ADSM.ORG Member
Joined
Jun 9, 2016
Messages
18
Reaction score
0
Points
0
PREDATAR Control23

Good Day Folks,

Long time lurker here and relatively new to TSM systems. Love the forum lots of info and very helpful.

A while back due to restructuring was given TSM duties - never worked with TSM before so forgive any
silliness I might spew out.

The TSM system is running pretty well except for a few glitches here and there...

I have a situation that maybe someone can point me in the right direction to find a fix

This is my issue;

I have several active storage pools out of which there are 2 pools always in a critical status since they always appear full:

- WINDOWSPOOL
+
- AIXPOOL

both pools are Primary DISK class pools with their respective TAPE counterparts set as "Next Pool"
(WINDOWSTAPE & AIXTAPE)

Here are some configs that are set for these pools:

Storage Pool Name: WINDOWSPOOL
Storage Pool Type: Primary
Device Class Name: DISK
Estimated Capacity: 399 G
Space Trigger Util: 98.0
Pct Util: 98.0
Pct Migr: 0.0
Pct Logical: 100.0
High Mig Pct: 90
Low Mig Pct: 20
Migration Delay: 0
Migration Continue: Yes
Migration Processes: 1
Reclamation Processes:
Next Storage Pool: WINDOWSTAPE
Reclaim Storage Pool:
Maximum Size Threshold: No Limit
Access: Read/Write
Description: Windows disk storage pool
Overflow Location:
Cache Migrated Files?: Yes
Collocate?:
Reclamation Threshold:
Offsite Reclamation Limit:
Maximum Scratch Volumes Allowed:
Number of Scratch Volumes Used:
Delay Period for Volume Reuse:
Migration in Progress?: No
Amount Migrated (MB): 145,763.32
Elapsed Migration Time (seconds): 2,329
Reclamation in Progress?:
Last Update by (administrator): ***
Last Update Date/Time: 0*/0*/2016
Storage Pool Data Format: Native
Copy Storage Pool(s):
Active Data Pool(s):
Continue Copy on Error?: Yes
CRC Data: No
Reclamation Type:
Overwrite Data when Deleted:
Deduplicate Data?: No
Processes For Identifying Duplicates:
Duplicate Data Not Stored:
Auto-copy Mode: Client
Contains Data Deduplicated by Client?: No
-----
Storage Pool Name: AIXPOOL
Storage Pool Type: Primary
Device Class Name: DISK
Estimated Capacity: 299 G
Space Trigger Util: 92.6
Pct Util: 92.6
Pct Migr: 0.0
Pct Logical: 100.0
High Mig Pct: 90
Low Mig Pct: 20
Migration Delay: 0
Migration Continue: Yes
Migration Processes: 1
Reclamation Processes:
Next Storage Pool: AIXTAPE
Reclaim Storage Pool:
Maximum Size Threshold: No Limit
Access: Read/Write
Description: AIX disk storage pool
Overflow Location:
Cache Migrated Files?: Yes
Collocate?:
Reclamation Threshold:
Offsite Reclamation Limit:
Maximum Scratch Volumes Allowed:
Number of Scratch Volumes Used:
Delay Period for Volume Reuse:
Migration in Progress?: No
Amount Migrated (MB): 36,987.41
Elapsed Migration Time (seconds): 576
Reclamation in Progress?:
Last Update by (administrator): ***
Last Update Date/Time: 0*/0*/2016
Storage Pool Data Format: Native
Copy Storage Pool(s):
Active Data Pool(s):
Continue Copy on Error?: Yes
CRC Data: No
Reclamation Type:
Overwrite Data when Deleted:
Deduplicate Data?: No
Processes For Identifying Duplicates:
Duplicate Data Not Stored:
Auto-copy Mode: Client
Contains Data Deduplicated by Client?: No


To try and solve the problem I attempted to manually increase the storage pools capacity which seemed to fix it for a few days but eventually went back to its critical status and I've been scratching my head ever since trying to find a solution.

Manually setting HI/LO migration thresholds do not appear to do anything.

I have tried manually reclaiming space but I always get this type of error :
ANR4929E RECLAIM STGPOOL: The storage pool AIXPOOL is not a sequential-access pool.
ANS8001I Return code 3.


Searching the net for info related to this never yields any useful solutions.

Not sure if issue is configuration or other, hoping one of you gurus can steer me in the right direction to fix this.

thank in advance,

Walter
 
PREDATAR Control23

Can you show us what you're putting in for your migration command? Also how much free space are available on your WINDOWSTAPE and AIXTAPE? Do you have enough scratch volumes to migrate?
 
PREDATAR Control23

Hi Mosiac,

thx for the quick reply....

When I attempt manual migration I use this command:

TSM1>migrate stgpool windowspool lowmig=0 wait=yes

with results similar to this:
ANR4924I MIGRATE STGPOOL: Migration is not needed for the storage pool WINDOWSPOOL.
ANS8001I Return code 11.


The WINDOWSTAPE and AIXTAPE pools show approximately 20TB of LTO5 available space

At this time I have about 8 LTO5 scratch tapes available in the system

thanks!
 
PREDATAR Control23

Hmm I'm thinking this is beyond my being able to help. The only thing I do different when I run manual migrations is to add a duration because mine never finish when I'd like them to if I don't.

Sorry I can't be of more help.
 
PREDATAR Control23

thanks Marclant!

Interesting, the caching details make sense for my situation since both these storage pools have their caching options set to "yes" this system is probably just doing its job.

I will try to disable caching temporarily on these pools and see what gives and report back.

thanks again!
 
PREDATAR Control23

Hi Marclant,

Just wanted to let you know that your suggestion was spot on.

Disabled caching on the storage pools and performed a "Move Data" now errors are gone

thank again!
 
PREDATAR Control23

Hi Guys,

Been scouring the forum for clues to fix a critically full VMFILE pool (vmwarepool)

At the moment all my VM backups have failed and I am at a loss to find a solution to free up space and get things going again.

I've tried to 'Reclaim' and "Move Data" manualy to no avail.

I have a feeling its related to the fact I was low on scratch tapes for a while,
at this time I have about 6 scratch tapes available in the library but the system is at a stand still.

Here are the details of the VMWAREPOOL in question:

Storage Pool Name: VMWAREPOOL
Storage Pool Type: Primary
Device Class Name: VMFILE
Estimated Capacity: 1,741 G
Space Trigger Util: 88.2
Pct Util: 96.9
Pct Migr: 96.9
Pct Logical: 56.5
High Mig Pct: 85
Low Mig Pct: 70
Migration Delay: 0
Migration Continue: Yes
Migration Processes: 2
Reclamation Processes: 1
Next Storage Pool: VMWARETAPE
Reclaim Storage Pool:
Maximum Size Threshold: No Limit
Access: Read/Write
Description: VMWARE disk storage Pool
Overflow Location:
Cache Migrated Files?:
Collocate?: Group
Reclamation Threshold: 60
Offsite Reclamation Limit:
Maximum Scratch Volumes Allowed: 120
Number of Scratch Volumes Used: 120
Delay Period for Volume Reuse: 0 Day(s)
Migration in Progress?: No
Amount Migrated (MB): 0.00
Elapsed Migration Time (seconds): 10
Reclamation in Progress?: No
Last Update by (administrator): ***
Last Update Date/Time: 09/06/2016 21:55:45
Storage Pool Data Format: Native
Copy Storage Pool(s):
Active Data Pool(s):
Continue Copy on Error?: Yes
CRC Data: No
Reclamation Type: Threshold
Overwrite Data when Deleted:
Deduplicate Data?: Yes
Processes For Identifying Duplicates: 1
Duplicate Data Not Stored: 1,910 M ( 0%)
Auto-copy Mode: Client
Contains Data Deduplicated by Client?: Yes

Any suggestions would be greatly appreciated

thanks!
 
PREDATAR Control23

Maximum Scratch Volumes Allowed: 120
Number of Scratch Volumes Used: 120
Increase the number of scratch if there is enough space in devclass VMFILE. So look at what your devclass VMFILE, take the sum of the space of all the filesystems listed under "Directory:" and divided that by the "Est/Max Capacity (MB):". That will give you how many volumes/scratch you can have. Then update the Maxscratch of your storage pool to match that. If you used up all the space already, then add another filesystem to it, then redo the math to update the Maxscratch
 
PREDATAR Control23

Hi Marclant,

I ended up increasing my VMWAREPOOL disk to about 2.7 TB (was originally 1.7Tb~), increased my maxscratch to 175, backups seemed to run ok for awhile but again the system is complaining about out of storage and all migration, reclamation and backup tasks fail since there no space on the server.

I have about 12 scratch tapes in the library at the moment.

TSM logs show:

ANR0522W Transaction failed for session 1879 for node VM_DC (TDP VMware) - no space available in storage pool VMWAREPOOL and all successor pools.

ANR1893E Process 5038 for SPACE RECLAMATION completed with a completion state of FAILURE.

etc.etc.

stgpool details:
Storage Pool Name: VMWAREPOOL
Storage Pool Type: Primary
Device Class Name: VMFILE
Estimated Capacity: 2,781 G
Space Trigger Util: 80.2
Pct Util: 80.2
Pct Migr: 80.2
Pct Logical: 54.2
High Mig Pct: 80
Low Mig Pct: 70
Migration Delay: 0
Migration Continue: Yes
Migration Processes: 2
Reclamation Processes: 1
Next Storage Pool: VMWARETAPE
Reclaim Storage Pool:
Maximum Size Threshold: No Limit
Access: Read/Write
Description: VMWARE disk storage Pool
Overflow Location:
Cache Migrated Files?:
Collocate?: Group
Reclamation Threshold: 60
Offsite Reclamation Limit:
Maximum Scratch Volumes Allowed: 175
Number of Scratch Volumes Used: 175
Delay Period for Volume Reuse: 0 Day(s)
Migration in Progress?: No
Amount Migrated (MB): 0.00
Elapsed Migration Time (seconds): 13
Reclamation in Progress?: No
Last Update by (administrator): admin
Last Update Date/Time: 09/13/2016 09:38:35
Storage Pool Data Format: Native
Copy Storage Pool(s):
Active Data Pool(s):
Continue Copy on Error?: Yes
CRC Data: No
Reclamation Type: Threshold
Overwrite Data when Deleted:
Deduplicate Data?: Yes
Processes For Identifying Duplicates: 1
Duplicate Data Not Stored: 1,918 M ( 0%)
Auto-copy Mode: Client
Contains Data Deduplicated by Client?: Yes


Again any suggestions would be greatly appreciated as i am stumped

Thanks in advance!

edit: attached storage pool overview screenshot
 

Attachments

  • tsm_fail1.png
    tsm_fail1.png
    423.9 KB · Views: 5
PREDATAR Control23

excellent, thank you again Marclant much appreciated...I increased the maxscratch for the stgpool and started reclamation - will see where it goes..

The dedupe seems to be working normally but will verify with the info you provided

cheers
 
PREDATAR Control23

You should check: SHOW DEDUPDELETEINFO and see if there's a large Number of chunks waiting in queue
 
PREDATAR Control23

Hi,

Just wanted to report back and let you know that my system seems to be back in business,
I think the initial lack of physical pool disk space on the TSM server (caused by increased backups) coupled with missing scratch tapes seem to have caused a backlog of work for the system.

As of this week backs up seems to be running normally again.

Your suggestion were greatly appreciated so thanks for that!

Now I just need to see why the system is always eating up scratch tapes, its always sending tapes to vault but vaultretrieves are not as regular so the system is always complaining about not enough scratch tapes?!

Thanks again!
 
PREDATAR Control23

Now I just need to see why the system is always eating up scratch tapes, its always sending tapes to vault but vaultretrieves are not as regular so the system is always complaining about not enough scratch tapes?!
Make sure expiration and offsite reclamation runs successfully. And depending on your retention settings, it could be a while before it levels off and you start getting a decent amount of tapes back.
 
PREDATAR Control23

Hi Marclant,

Up to today still I was having issues with tapes not regularly coming back from vault...
most days no tapes, others maybe 1.

Today I noticed something strange....48 tapes found in vault with status pending (o util / o cap)

I found a post about resetting reuse delay to 0 and then back to my original value (4 days)
and that triggered the vault retrieve on those tapes.

This has been on ongoing issue with this system since the beginning without a solution ...would you happen to know why tapes would go regularly in pending status and not be returned automatically? I'm thinking its caused by glitches with the version of TSM software we are using. (running TSM 6.4 lvl 0.1)
Will continue to research this, but upgrade is coming soon.

In any case thanks very much for your help with this whole thing..I've learnt a lot!

cheers!
 
PREDATAR Control23

would you happen to know why tapes would go regularly in pending status and not be returned automatically?
You said you were using a reuse delay, that's why. If a tape becomes empty, it will change the status from full to pending. After the reuse delay is elapsed, it will return to scratch.
 
Top