Probably just bad luck….
When I set up FILE pools for customers, I usually have to tweak them a couple
of times to get the sizing right, depends on the load, the number of concurrent
sessions, etc. Been there, done that, got the scars.
Assumptions you should change:
• Unlike a disk pool, if there is no space available in a TYPE=FILE pool,
backups don't fail over to the NEXT stgpool. WAD. I don't know why it's that
way. I think some RFE pressure is indicated, it causes me grief.
• In a seq pool on disk, you need to be much more aggressive about
reclaims. If you have reclaim set at 59, you are saying you are willing to
live with 59% of your disk space dead/expired and unusable! That means you
need to size your disk pool so that 41% is big enough to hold the entire
night's backup. I set reclaim on my disk pools to 20%, or 15% if the disk
throughput is sufficient to tolerate the I/O.
• Migration from a sequential pool may not be working like you think;
read the DEFINE STGPOOL HIGHMIG option definition in the admin ref for your
version. I always set MAXSCRATCH to 0 for a sequential file pool and use
pre-defined volumes instead of scratch so I have better control over what
happens.
• You have mountlimit set to 40 in the devclass; how many concurrent
client sessions do you have writing to that pool?
• Also check server option NUMOPENVOLSALLOWED to make sure you can have
enough volumes in use at once to do concurrent backups plus reclaims plus
backup stgpool plus migration etc etc etc.
• If you are going to fill this pool and empty it out via migration every
night, best to force the migration yourself with a MIGRATE STPOOL command
rather than relying on the threshold. And if reclaims don't kick in on their
own regularly, set up a RECLAIM STGPOOL schedule to fire daily anyway. Won't
hurt.
• The usual problem I see is that people don't have enough volumes
defined in the pool to account for all the concurrent sessions, plus some
empty volumes to allow for reclaims, plus a high enough NUMOPENVOLSALLOWED.
You've defined your volumes at 50G, so you should have enough. One of these
other issues is probably your problem.
• While things are working well, check daily to see what is a "normal"
value of the number of "empty" volumes in that pool. Then set yourself an
alert to let you know when the number of "empty" volumes drops below the
"normal" value so you can investigate before disaster sets in.
Good luck!
Wanda Prather
TSM Consultant
ICF International Enterprise and Cybersecurity Systems Division
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Zoltan Forray
Sent: Friday, February 13, 2015 12:13 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] DEVCLASS=FILE - what am I missing
Up until recently, I have always used DEVCLASS=DISK for disk storage and always
preformatted/allocated the disk volumes into multiple chunks to all for
multi-I/O benefits.
When I recently stood-up a new server, I decided to try DEVCLASS=FILE for
disk-based storage/incoming backups.
I thought I understood that FILE type storage was basically "tape/sequential
files on disk" and would act accordingly and things like reclamation now
applied so when the file chunks (I defined 50GB file sizes) got below the
reclaim value, it would reclaim such files, create new ones and delete the old
ones automagically.
Well, last night became a disaster. Backups failing all over because it
couldn't allocate any more files and also would not automatically shift to use
the "nextpool" which is defined as a tape pool.
So, what am I doing wrong? What assumptions are wrong? Here is the devclass
values with the empty values left out...:
Device Class Name: TSMFS
Device Access Strategy: Sequential
Storage Pool Count: 1
Device Type: FILE
Format: DRIVE
Est/Max Capacity (MB): 51,200.0
Mount Limit: 40
Directory: /tsmpool
Here is the lone stgpool that used this devclass:
12:06:21 PM GALAXY : q stg backuppool f=d
Storage Pool Name: BACKUPPOOL
Storage Pool Type: Primary
Device Class Name: TSMFS
Estimated Capacity: 7,106 G
Space Trigger Util: 84.5
Pct Util: 80.9
Pct Migr: 80.9
Pct Logical: 99.2
High Mig Pct: 85
Low Mig Pct: 75
Migration Delay: 0
Migration Continue: Yes
Migration Processes: 1
Reclamation Processes: 1
Next Storage Pool: PRIMARY-ONSITE
Reclaim Storage Pool:
Maximum Size Threshold: No Limit
Access: Read/Write
Description:
Overflow Location:
Cache Migrated Files?:
Collocate?: No
Reclamation Threshold: 59
Offsite Reclamation Limit:
Maximum Scratch Volumes Allowed: 143
Number of Scratch Volumes Used: 137
Delay Period for Volume Reuse: 0 Day(s)
Migration in Progress?: No
Amount Migrated (MB): 0.00
Elapsed Migration Time (seconds): 1,009
Reclamation in Progress?: No
Last Update by (administrator): ZFORRAY
Last Update Date/Time: 02/13/2015 11:44:23
Storage Pool Data Format: Native
Copy Storage Pool(s):
Active Data Pool(s):
Continue Copy on Error?: Yes
CRC Data: No
Reclamation Type: Threshold
Overwrite Data when Deleted:
Deduplicate Data?: No Processes For Identifying Duplicates:
Duplicate Data Not Stored:
Auto-copy Mode: Client Contains Data Deduplicated by
Client?: No
I calculated the "Max Scratch Volumes" value based on having ~7.6TB filesystem
so 50GB * 143 = 7.1TB
This morning when I checked, there were plenty of volumes with <40% utilized.
SO why didn't reclaim kick-in? or am I totally off on this
assumption? I manually performed move data on them and it freed things up.
--
*Zoltan Forray*
TSM Software & Hardware Administrator
BigBro / Hobbit / Xymon Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
zforray AT vcu DOT edu<mailto:zforray AT vcu DOT edu> - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will never
use email to request that you reply with your password, social security number
or confidential personal information. For more details visit
http://infosecurity.vcu.edu/phishing.html
|