My tape space is running out...

EduardoRio

ADSM.ORG Member
Joined
Oct 15, 2018
Messages
61
Reaction score
0
Points
0
Hello there,
I'm new from here and this is my first post. So, I don't even know if I'm doing it in the right place!!

But I'm looking for some help.
The question is: I'm using a TS3200 with 41 LTO4 tapes inside, for set of backups for 4 linux, 6 SQLplus and 21WinNt machines.
There are two pools: one "diskpool" with mere 360GB and one "tapepool" with 41 x 1.5GB.

All had been running smooth until 4 days ago wen I sought that I have 34 tapes Full and the other ones as much as 80%.

I've tried to run "move data" command, also decrease "reclaim stg tapepool threshold=30"
"q proc" shows "Reclamation" running but the process doesn't finish and no scratch tape...
After that I've tried to "expiration" and again nothing. The tapes just continue to getting full.

So, I need some help!!!
Thanks you all in advance.
Eduardo
 
Reclamation" running but the process doesn't finish and no scratch tape...

In the tape library, are there any scratch tapes?
If not, need to check in some scratch tapes.

If there are some scratch tapes.

It needs to finish.
Why doesn't it finish?
Does it fail with an error?
Does it get cancelled?
Does it hang?

Check the actlog ( q act begint=HH:MM endt=HH:MM or q act endt=now-01:00) around the time frame of the space reclamation process to see if there are any hints as to why the process does not finish.

I suspect that the reclamation process may have gotten preempt by a higher process.

Good Luck,
Sias
 
That's the issue. It needs to finish. I do think so!!
Why doesn't it finish? I don't know!!
Does it fail with an error? No, the process take a long time, most of it is about "waiting for mounting point..." and... nothing!!
Does it get cancelled? Yes I can do it!!
Does it hang?

Thanks!
 
Led888,
I've run the "q act" and the response is quite long with all the backup procedures running smoothly well.
For the reclamation process, as result of a "q proc" shows me that it have been running to at least 100000 seconds or, about 15h. Due my diskpool looks almost full, I decided to cancel the reclamation process and run a second migration process to avoid it getting full and creat some space on disk to re start the reclamation.

BTW, the "q act" response for it was:
...
11/16/2018 10:56:17 ANR2017I Administrator ADMIN1 issued command: QUERY PROCESS (SESSION: 5032)
11/16/2018 10:56:31 ANR2017I Administrator ADMIN1 issued command: CANCEL PROCESS 59 (SESSION: 5032)
11/16/2018 10:56:31 ANR0940I Cancel request accepted for process 59. (SESSION: 5032)
11/16/2018 10:56:31 ANR1080W Space reclamation is ended for volume 524AGJL4. The process is canceled. (PROCESS: 59)
11/16/2018 10:56:31
ANR0985I Process 59 for SPACE RECLAMATION running in the BACKGROUND completed with completion state FAILURE at
10:56:31. (PROCESS: 59)
11/16/2018 10:56:31 ANR1893E Process 59 for SPACE RECLAMATION completed with a completion state of FAILURE. (PROCESS: 59)
11/16/2018 10:56:31 ANR4936I The reclamation of the TAPEPOOL storage pool is complete. Number of files reclaimed: 0. Number of reclaimed bytes: 0. Number of reclaimed deduplicated bytes: 0. Number of reconstructed files: 0. Number of unreadable files: 0.

Thakns!!!!!!!!!!!!!!!!!!!
 
In the tape library, are there any scratch tapes? No! No one!!

For space reclamation to occur, do need a scratch tape in the tape library.
Your trading in 1 scratch tape for possibly 2 scratch tapes.

...as result of a "q proc" shows me that it have been running to at least 100000 seconds or, about 15h. Due my diskpool looks almost full, I decided to cancel the reclamation process and run a second migration process to avoid it getting full and creat some space on disk to re start the reclamation.

If there are no scratch tape(s) in the library. The space reclamation process will be waiting for a mount of a scratch tape.

Do check in some scratch tape(s). If there are no space in the library. Need to check out some tapes to make room for a scratch tape.
NOTE: If you are going to be checking in some new scratch tape(s). Need to run the label libv on the scratch tape. If not, the TSM Server will not be able to use the tape and will mark it as private.

Good Luck,
Sias
 
First, THANKS LED888!!!

When you said that there should be at least one scratch tape in library, it means an empty tape? or a tape in wich I will be able to do a "run scratch" command?

For the first, I really don't have empty tapes, for the second option I think that all tapes could be scratched.
Eduardo
 
And yes, The output of a "q libv" is all tapes = "private"
 
When you said that there should be at least one scratch tape in library, it means an empty tape?
Yes, but more importantly a status of scratch.

You can use this to check how many you have per library:
select library_name,count(*) AS "Number of scratch" from libvolumes where status='SCRATCH' group by library_name

Or if you just have one library, you can use a shorter version of the same command:
select count(*) AS "Number of scratch" from libvolumes where status='SCRATCH'
 
And yes, The output of a "q libv" is all tapes = "private"
Do a move data on a tape that is near empty to the same pool. That will copy it on a filling tape and make that one scratch. Then start the reclamation right away before a process or a backup picks it up.
 
MARCLANT Great!!
As I suspect, the response for the select is: "zero"
I'll do a move data and start a reclamation right after.
Thanks!!
 
Hello!
Another question!!
My ts3220 has 44 slots, all ones already loaded. But they have also 3 I/O stations. Should I load 2 extra scratch tapes on it?
Thanks <
Eduardo
 
Not versed with the TS3220 (looks like its close to my 3310's), but if its anything like the other library's I've used trying to use the IO slots as storage cells is an exercise in frustration.
 
Oh sorry ! That's a TS 3200 indeed.
Ok ! Thanks !

I'll not give a try for sure...
 
Hello!!!
Each day of work with TSM is full of news and challenges !!! ;)

Since no move data or anything else are going to work for me and my tapes still going full...
So, I decide to do some radical action to try to solve it. But, in mean time I've been noticed about some strange thing (strange at least for me) .
As result of a " q vol" the first strangeness were tapes"Estimate capacity" of 900GB even tough the tapes are all 1.5TB. Also, some ones shows they are full with just "Pct util" of 22%, other 60%.

For one of these, the "q vol f=d" looks like

Volume Name: 502AGJL4
Storage Pool Name: TAPEPOOL
Device Class Name: DEVCLASS_LTO4
Estimated Capacity: 1.5 T
Scaled Capacity Applied:
Pct Util: 22.2
Volume Status: Full
Access: Read/Write
Pct. Reclaimable Space: 77.8
Scratch Volume?: Yes
In Error State?: No
Number of Writable Sides: 1
Number of Times Mounted: 108
Write Pass Number: 1
Approx. Date Last Written: 11/15/2018 09:54:16
Approx. Date Last Read: 11/22/2018 10:35:59

--------------------------------------------
So, I'm keep learning (or trying!!) .
Now I'm looking for how to restore the real amount of tape capacity and turn at least 3 of them to scratch.
Eduardo
 
So a thought occurred to me, can you get more disk attached to your tsm server? If so could follow along with this document and likely free up a tape. https://www.ibm.com/support/knowled...com.ibm.itsm.srv.doc/t_reclaim_one_drive.html

Estimated capacity is based on what TSM thinks it can write to tapes. LTO4 is 800gb native, 1.5tb compressed. You may see tapes that report more capacity. You may see tapes that report slightly less.

Let us talk about how your tsm storage pools are setup:
Your post above mentioned a 360gb disk pool, and then a tape pool.
I take it the tape pool isn't a copy pool, but a next storage pool where data gets migrated from disk to tape? You can do q stgp <name of tape pool> and see the 2nd line or so say: Storage Pool Type: Primary.
How many tape drives do you have? Are they both online? (q drive, q path).
What is the reuse delay of your tape pool? If greater than 1, I suggest setting it to 0 for this exercise so the tapes we free up returns to scratch as soon as it can.

Was thinking, and I tested this out with my non-prod box which worked for me, and this is ONLY IF your tape storage pool is marked as Primary.
Your results may vary, especially as time goes on and you are likely running out of disk space. This is at your own risk.
  1. Find volumes that are very low. We want a volume that's 10% or less I'd say. Less even better.
    Code:
    select stgpool_name,volume_name,pct_reclaim from volumes where scratch='YES' and volume_name like '%L4' and pct_reclaim>90 order by stgpool_name
    The code should work for you since the tape you posted above is L4 at the end. Replace >90 with number of your choosing, but finding a tape that's 90% reclaimable or higher is my recommendation. Your output should be something like this:
    Code:
    STGPOOL_NAME: RANDOM_TPOOL VOLUME_NAME: 060661L4
    PCT_RECLAIM: 93.9
    pct_reclaim is what we are looking for. This tape in my example needs to be reclaimed and less that 10% used.
  2. Verify you have enough free space in your disk pool. 360gb isn't a lot to work with. How full is it? If its greater than 50% full, do you have any tapes in a 'filling' state you could migrate data from disk to tape to empty it out (if your tape pool is primary that is)? I suggest disabling client sessions so we don't fill right up again. Rough numbers: If tape compression is working you will need between 80g to 150gb free for 10%. Since I do not know your environment, I've erred on the side of caution. You may need to adjust your migration settings on your disk storage pool.
  3. Once there's enough disk space free, and you selected your lowest utilized tape issue move data <tape volume> stgpool=<disk stoage pool name>
  4. Once move data is complete, and you have greater than 2 drives and both are online/enabled then try to kick off a reclamation.
  5. Once you have enough tape space freed up, adjust your migration parameters back to where they work, and enable client sessions along with setting the reuse delay if any. And then anything else you might have changed :)
I've made a lot of assumptions about your environment in the above post. If you have any questions or need more of a guided walk though, post back and I'll lay it out in more detail. I shall try to check back when I can, just the day after holiday here for myself and may be tied up. If you are unsure of what I posted, please ask.

**Edit - I know I was jumping about all over the place, just wanted to get something out that may be of help. And I've yet to have my coffee after getting paged at 7am :)
 
I'm doing the procedures and as soon as possible I'll post the results.
In mean time - THANKS A LOT!!!
 
Like I said, I made a lot of assumptions.
Read each step a few times and gather all the info before you start. If questions post.
Main points are enough disk space. Tape pool are also a primary pool. You have more than two online and functional tape drives.
 
RecoveryOne
First, thanks again. :)
A brief history...
All this issue came on me after I discovered a TSM server and TS3200 robot with 44 tapes unused since 2013. And, more critical, our (work on a public environment agency at Rio) datacenter running with no backup devices or automated procedures. We have some 50 servers, ~10 SQLServer machines and another ~8 Oracle databases, full of documents, environment data like air and water quality, monitoring data from sensor stations around de State, wheatear data and so on.
Since I've been working on the IT management team, yes, I became very concerned and upset about this situation.
So, I decide to put the hands on. On the first month I had been able to contract a IBM STM consultant, who made the setting up and deployed the first set of backup routines. BTW, his work ahve not achieve the its goal, because (Murphy's Law) at the very first work week the TS3200 picker just stop to work... So, he doesn't had the enough time to do all the things including some training with our tech staff.

So, after these first effort on this matter and without a TSM specialist, I had been monitoring myself (and learning). All begin to work smoothly, until few weeks ago wen the tapes start to becoming full. At the beginning move data works great, but nowadays, not.

So, let answer all your questions!

So a thought occurred to me, can you get more disk attached to your tsm server?
By this time, I don't think so. We are using the internal server disk. The solution should be using a storage. But in order of doing it we'll have to: 1 change the way the fibers are connected(serverTSM/TS3200/storage); 2 reconfigure TSM; and redirect the stgpool and, sincerely I have no idea of how to do the steps 2 and 3.

If so could follow along with this document and likely free up a tape. https://www.ibm.com/support/knowled...com.ibm.itsm.srv.doc/t_reclaim_one_drive.html

Estimated capacity is based on what TSM thinks it can write to tapes. LTO4 is 800gb native, 1.5tb compressed. You may see tapes that report more capacity. You may see tapes that report slightly less.

Let us talk about how your tsm storage pools are setup:
Your post above mentioned a 360gb disk pool, and then a tape pool.
I take it the tape pool isn't a copy pool, but a next storage pool where data gets migrated from disk to tape?
Yes, I think so. The backup routines are made on the diskpool and copied to tapes, after that.

You can do q stgp <name of tape pool> and see the 2nd line or so say: Storage Pool Type: Primary.
That is the resumed output:
q stg tapepool f=d


Storage Pool Name: TAPEPOOL
Storage Pool Type: Primary
Device Class Name: DEVCLASS_LTO4
Estimated Capacity: 11,097,877 G
Pct Util: 0.3
Pct Migr: 0.4
Pct Logical: 100.0
High Mig Pct: 90
Low Mig Pct: 70
Migration Delay: 0
Migration Continue: Yes
Migration Processes: 1
Reclamation Processes: 1
Collocate?: Group
Reclamation Threshold: 70
Offsite Reclamation Limit:
Maximum Scratch Volumes Allowed: 9,999
Number of Scratch Volumes Used: 41
Delay Period for Volume Reuse: 0 Day(s)
Migration in Progress?: No
Amount Migrated (MB): 0.00
Elapsed Migration Time (seconds): 0
Reclamation in Progress?: No
Storage Pool Data Format: Native
Copy Storage Pool(s):
Active Data Pool(s):
Continue Copy on Error?: Yes
Reclamation Type: Threshold
Overwrite Data when Deleted:
Deduplicate Data?: No
Auto-copy Mode: Client


How many tape drives do you have?
tsm: JACARANDA>q drive

Library Name Drive Name Device Type On-Line
------------ ------------ ----------- -------------------
TS3200 DRIVE0 LTO Yes
TS3200 DRIVE1 LTO Yes


Are they both online? (q drive, q path).
Code:
tsm: JACARANDA>q path

Source Name     Source Type     Destinatio-     Destinatio-     On-Line
                                n Name          n Type
-----------     -----------     -----------     -----------     ----------
JACARANDA       SERVER          TS3200          LIBRARY         Yes
JACARANDA       SERVER          DRIVE0          DRIVE           Yes
JACARANDA       SERVER          DRIVE1          DRIVE           Yes

What is the reuse delay of your tape pool? If greater than 1, I suggest setting it to 0 for this exercise so the tapes we free up returns to scratch as soon as it can.

I'm not sure, this could be the "Migration Delay" If so = 0


Was thinking, and I tested this out with my non-prod box which worked for me, and this is ONLY IF your tape storage pool is marked as Primary.

Yes they are -
"Storage Pool Name: TAPEPOOL
Storage Pool Type: Primary
"


Your results may vary, especially as time goes on and you are likely running out of disk space. This is at your own risk.

Ok, Understood, don't worry about it
  1. Find volumes that are very low. We want a volume that's 10% or less I'd say. Less even better.
    Code:
    select stgpool_name,volume_name,pct_reclaim from volumes where scratch='YES' and volume_name like '%L4' and pct_reclaim>90 order by stgpool_name
    The code should work for you since the tape you posted above is L4 at the end. Replace >90 with number of your choosing, but finding a tape that's 90% reclaimable or higher is my recommendation. Your output should be something like this:
    Code:
    STGPOOL_NAME: RANDOM_TPOOL VOLUME_NAME: 060661L4
    PCT_RECLAIM: 93.9
  2. My output for 90% and 80% were - no match
  3. - For 70% was - STGPOOL_NAME: TAPEPOOL VOLUME_NAME: 502AGJL4
    PCT_RECLAIM: 77.8

    - For 60% also this one - Volume_NAME: 524AGJL4
    PCT_RECLAIM: 69.5
  4. pct_reclaim is what we are looking for. This tape in my example needs to be reclaimed and less that 10% used.
  5. Verify you have enough free space in your disk pool. 360gb isn't a lot to work with. How full is it? If its greater than 50% full, do you have any tapes in a 'filling' state you could migrate data from disk to tape to empty it out (if your tape pool is primary that is)?
  6. No, the diskpool looks as 97% and I don't understand why the migration process starts and stops with no success, even existing one tape with just 30% used.
  7. I suggest disabling client sessions so we don't fill right up again. Rough numbers: If tape compression is working you will need between 80g to 150gb free for 10%. Since I do not know your environment, I've erred on the side of caution. You may need to adjust your migration settings on your disk storage pool.
  8. I think I have done it by using "upd stg diskpool migpro=2 - putting the two drives to work on it; and this upd stg diskpool hi=50" is that right?

  9. Once there's enough disk space free, and you selected your lowest utilized tape issue move data <tape volume> stgpool=<disk stoage pool name>
  10. Sure I've tried several times to move data.
  11. OK, I'll try again
  12. Once move data is complete, and you have greater than 2 drives and both are online/enabled then try to kick off a reclamation.
  13. Once you have enough tape space freed up, adjust your migration parameters back to where they work, and enable client sessions along with setting the reuse delay if any. And then anything else you might have changed :)
I've made a lot of assumptions about your environment in the above post. If you have any questions or need more of a guided walk though, post back and I'll lay it out in more detail. I shall try to check back when I can, just the day after holiday here for myself and may be tied up. If you are unsure of what I posted, please ask.

Absolutely, your assumptions are quite right and the details are great and well explained. I'm new in the TSM world and some times ask for, I know, beginners doubts.

AND YES HAPPY THANKSGIVING HOLIDAY!!!

Sincerely yours,
Eduardo
 
Back
Top