Please help with TSM recovery log

dreamz

ADSM.ORG Member
Joined
Dec 30, 2008
Messages
162
Reaction score
1
Points
0
Hello All,

I just escaped from a TSM crash due to recovery log getting full. All the transactions were running pretty slowly and when i looked at the act log, it said "ANR2997W The server log is 82 percent full. The server will delay transactions by 3 milliseconds"

I then ran a dbbackup, process completed but it's not going away from Q PR. I did a show logpin, it displayed the last dbbackup and a node which was running backup during that time.

when i did a q sess, i could see that all the sessions were hanging in there. i tried cancelling few of them but they were not getting cancelled.

I then did a reboot of the TSM server to cancel all the sessions to prevent from the total server crash. I was also refering to few of the posts here and learned that reboot is not a good choice.

my db and log details are as follows:

Available Space (MB) : 50,000
Assigned Capacity (MB) : 47,000
Maximum Extension (MB) : 3,000
Maximum Reduction (MB) : 25,044
Page Size (bytes) : 4,096
Total Usable Pages : 12,032,000
Used Pages : 5,446,761
Pct Util : 45.3
Max. Pct Util : 45.3
Physical Volumes : 10
Buffer Pool Pages : 196,608
Total Buffer Requests : 13,665,016
Cache Hit Pct. : 98.26
Cache Wait Pct. : 0
Backup in Progress? : No
Type of Backup In Progress :
Incrementals Since Last Full : 0
Changed Since Last Backup (MB) : 480.86
Percentage Changed : 2.26
Last Complete Backup Date/Time : 10/1/09 12:13:41 AM PDT
Estimate of Recoverable Space (MB) : 2,468
Last Estimate of Recoverable Space (MB) : 8/27/08 9:00:29 PM PDT

q log f=d
Available Space (MB) : 9,000
Assigned Capacity (MB) : 8,008
Maximum Extension (MB) : 992
Maximum Reduction (MB) : 7,992
Page Size (bytes) : 4,096
Total Usable Pages : 2,049,536
Used Pages : 3,297
Pct Util : 0.2
Max. Pct Util : 83.1
Physical Volumes : 6
Log Pool Pages : 128
Log Pool Pct. Util : 10.86
Log Pool Pct. Wait : 0
Cumulative Consumption (MB) : 1,111,833.25
Consumption Reset Date/Time : 1/22/08 9:59:17 AM PST


I would like to know if we need to take it as a TSM server issue or client files which were blocking (as discussed in the other posts).

Do we need to extend the log? or Any other precautionary measure so that it doesn't happen again?

Thanks for your patience for reading so far and it will be of great help if someone takes the pain in explaining as my brain is still in a blank state :mad:
 
Hi,

if a hung session cannot be cancelled (and killing it from the client side - stopping scheduler or killing the process - has no effect) then cycling TSM server (just service, not the machine) can be the only one thing left.
It is good to kill other sessions and dismount tapes before - just to have TSM as quiesced as possible.

Your RLog is large enough - do not forget that delaying transaction starts when RLOG is at 80% (or 85?) - so you should have time enough to solve the problem.
I see you do not have all space allocated to your Rlog - consider "extend log 992" command.

Maximum RLOG size is 13GB (in TSM 5.x) so you may increase it to 12G leaving 1G for emergency extend log in case of crash.
Setting DBBackupTrigger is also recommended.

Harry
 
It would be a good idea to determine the cause of why your log was filled?
clients with lots of files??

roll-forward or normal? (hint: 'Log Mode' in q status).
How often the DB is backed up?

Rudy
 
Dreamz,

The reason why you got the log lock is not because of a session it is because of a Process. The misconception sometimes is that it is a session. The show logpin command only works for sessions and does not display the process holding it up. This usually happens when you backup NDMP backup with your TSM Server and it is taking a long time. These are the steps you should take if you worried at 84% which is not very scary. Now 98% is really scary.

1. Cancel Processes
2. Disable sessions
3. Cancel sessions all (because if you fill your log 100% you cant get any of those backups back anyways)
4. Backup DB with Incremental or Full.
5. Say Thank God.
6. Enable sessions and restart any processes that were cancelled.
 
Another Note: Your log size is 9GB. Remember you could add more space to your log upto 13 GB. I would make it about 9 usable and ability to grow to 13 GB. You currently have 9GB total with 8 GB used for current log size. You also need to set your Log Space Trigger to grow at 80% or 90% depends on your DB size and how long it will take to back it up.
 
I wouldn't take the log all the way to 13gb, your just asking for trouble if you do that. Leave it at around 12, that way you have some room if you need to do an emergency extend at some point. (I have ran into this problem recently =) ).
 
I sugges that you do as Harry_Redl suggests. define a db backup trigger.
This will reduce your log usage if you run in rollforward mode.

/regards Daniel
 
Thank you for your responses every one!!!

I fell sick all of a sudden and had to take a slightly longer vacation.

Now, just to answer some of your questions,

- i feel we do have a space trigger defined for db as i see the following when i did a q spacetrigger f=d

DB Full Percentage : 80
DB Space Expansion Percentage : 20
DB Expansion prefix :
DB Maximum Size (Megabytes) : 0
Mirror Prefix 1 :
Mirror Prefix 2 :
Last Update by (administrator) : ADMIN
Last Update Date/Time : 3/15/08 8:31:43 AM PDT

@rore
Our log mode is set to Normal and we run dbbackup twice a day.

- Most interesting part is, the pct of db utilization was around 46% before my vacation and now it's 70%!!!!!

We did add few nodes in between but not sure how db shot up to 70%...almost double???:confused:

Available Space (MB) : 50,000
Assigned Capacity (MB) : 47,000
Maximum Extension (MB) : 3,000
Maximum Reduction (MB) : 13,984
Page Size (bytes) : 4,096
Total Usable Pages : 12,032,000
Used Pages : 8,441,980
Pct Util : 70.2
Max. Pct Util : 70.3
Physical Volumes : 10
Buffer Pool Pages : 196,608
Total Buffer Requests : 2,137,562,112
Cache Hit Pct. : 99.03
Cache Wait Pct. : 0
Backup in Progress? : No
Type of Backup In Progress :
Incrementals Since Last Full : 0
Changed Since Last Backup (MB) : 493.97
Percentage Changed : 1.5
Last Complete Backup Date/Time : 10/20/09 7:22:44 AM PDT
Estimate of Recoverable Space (MB) : 2,468
Last Estimate of Recoverable Space (MB) : 8/27/08 9:00:29 PM PDT

my q log shows like this now:
Available Space (MB) : 9,000
Assigned Capacity (MB) : 8,008
Maximum Extension (MB) : 992
Maximum Reduction (MB) : 8,004
Page Size (bytes) : 4,096
Total Usable Pages : 2,049,536
Used Pages : 295
Pct Util : 0
Max. Pct Util : 83.1
Physical Volumes : 6
Log Pool Pages : 128
Log Pool Pct. Util : 9.79
Log Pool Pct. Wait : 0
Cumulative Consumption (MB) : 1,199,158.625
Consumption Reset Date/Time : 1/22/08 9:59:17 AM PST

Do i need to start looking at extending db and log immediately? btw, we are doing mock restore tests as we have a DR test scheduled this month end. Could that have caused such a higer db util? if it so, would the db go to normal once we stop these?

Thank you so much guys, you guys are gems and this forum rocks!!!
 
Okay a few things.

Harry is talking about you dbbacktrigger not your space trigger. This is for when your log is in roll forward mode, so that is not going to be an issue.

In regards to extending your DB?
Your DB is 70% of 47gb utilized, I would think you are quite fine, monitor for growth and add as requried.

In regards to extending you log...
The ability to assign a logvolume but not assign it is one of the stupidest commands in TSM.
extend your log as Harry has suggested, I agree with going to 12gb and having 1 gb free.

Now... Do NOT make the mistake of defining that logvol to TSM, but not extending it. When your logs fill and you need to extend your logs, you need to create a volume outside of TSM and then add with the dsmserv extend log command. IF you are already at 13000 MB (it is actually around 13300 mb) defined, but you have not extended it, do not make the mistake of thinking that you can just bring it up and extend the logs.
DO NOT EXTEND IT TO 13GB


In regards to a log space trigger.
I strongly advise against that, another useless (imo) command. If your logs are filling to the stage that they are crashing your server then you need to look at why that is happening.


Just my 20c worth
 
Now... Do NOT make the mistake of defining that logvol to TSM, but not extending it. When your logs fill and you need to extend your logs, you need to create a volume outside of TSM and then add with the dsmserv extend log command. IF you are already at 13000 MB (it is actually around 13300 mb) defined, but you have not extended it, do not make the mistake of thinking that you can just bring it up and extend the logs.
DO NOT EXTEND IT TO 13GB

I have just finished recovering from that very thing. The instructor in TSM class recommended assigning 13GB of logspace and extending to only 12GB to make it easy to recover. M/UX should review their instructions. thousands of volumes audited, 200something tapes recalled, hundreds of thousands of HSM-migrated files found and restored with ba client, and 3 weeks of experience I wish I could have just read out of a book.
 
Okay a few things.

Harry is talking about you dbbacktrigger not your space trigger. This is for when your log is in roll forward mode, so that is not going to be an issue.

In regards to extending your DB?
Your DB is 70% of 47gb utilized, I would think you are quite fine, monitor for growth and add as requried.

In regards to extending you log...
The ability to assign a logvolume but not assign it is one of the stupidest commands in TSM.
extend your log as Harry has suggested, I agree with going to 12gb and having 1 gb free.

Now... Do NOT make the mistake of defining that logvol to TSM, but not extending it. When your logs fill and you need to extend your logs, you need to create a volume outside of TSM and then add with the dsmserv extend log command. IF you are already at 13000 MB (it is actually around 13300 mb) defined, but you have not extended it, do not make the mistake of thinking that you can just bring it up and extend the logs.
DO NOT EXTEND IT TO 13GB


In regards to a log space trigger.
I strongly advise against that, another useless (imo) command. If your logs are filling to the stage that they are crashing your server then you need to look at why that is happening.


Just my 20c worth

Thanks mateinone!
Do you mean to say i just do extend log 992 as suggested by Harry and see how it goes from there? may be monitor for few more days?

TSM DB is growing rapidly, it's almost 72% today. is it better to extend db now itself?
 
hello this is junaid I'm new to TSM I have one question also
 
Back
Top