ADSM-L

Re: [ADSM-L] TSM server appears to hang

2014-07-16 13:43:35
Subject: Re: [ADSM-L] TSM server appears to hang
From: "Rhodes, Richard L." <rrhodes AT FIRSTENERGYCORP DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 16 Jul 2014 17:41:03 +0000
We will be looking at all these things.  Right now we did the normal 
shutdown/reboot just to be sure (not sure of what!).

Rick



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Mitchell, Ruth Slovik
Sent: Wednesday, July 16, 2014 11:34 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: TSM server appears to hang

Hi Rick,

Have you tried pinpointing the problem using some of the show commands? I'd 
take a look at:

show resq  (to see if there are any waiters)

Try to correlate a particular session with the problem using the following:
show threads
show locks
show txnt

(I believe those are fairly well discussed in past threads on this list.)

Check what's going on on the DB2 side. In most cases you'll need to be the 
instance owner. Here are a few things to check, but I'm sure it's not 
exhaustive:

-Look at the db2diag.log for errors and warnings.
-Use 'db2pd -d yourdbname -utilities' and 'db2pd -d yourdbname -reorg' to see 
if you have any runstats or reorgs running; the output will show the state of 
anything out there.
-Use 'db2top -d yourdbname' , particularly using the 'B' option to view 
bottlenecks, and the 'U' option to see locks.

There's some reasonable  documentation on the db2 commands in the Knowledge 
Center:

http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.8.0/com.ibm.db2.luw.admin.trb.doc/doc/c0054595.html?lang=en

http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.8.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0025222.html?lang=en


HTH,
Ruth Mitchell
U of I, Urbana

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Rhodes, Richard L.
Sent: Wednesday, July 16, 2014 10:08 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] TSM server appears to hang

Hi Everyone,

The past couple of days we're had a strange problem with one of our TSM 
instances (v6.2.5).  At times it appears to hang.

Last night (and the previous night) it had many servers that got a dozen or 
more sessions.  This is really strange!  This morning as I was looking at this, 
cmds like "q vol" and "q stgpool" hang - no response!  Commands like "q node" 
and "q proc" work.  The server was doing very little I/O.  All of a sudden the 
hung cmds all ran through and the server I/O jumped to 200-400MB/s.  Something 
was locking I/O.  I think the many sessions are clients that retry because the 
server is not responding.

In the TSM actlog there are no unusual messages about the time it un-stuck.  
The only strange entry in the actlog is a ANR9999D with lockwait error early 
the previous evening.    There are no AIX errors.

Any thought?

Rick






-----------------------------------------

The information contained in this message is intended only for the personal and 
confidential use of the recipient(s) named above. If the reader of this message 
is not the intended recipient or an agent responsible for delivering it to the 
intended recipient, you are hereby notified that you have received this 
document in error and that any review, dissemination, distribution, or copying 
of this message is strictly prohibited. If you have received this communication 
in error, please notify us immediately, and delete the original message.


-----------------------------------------
The information contained in this message is intended only for the personal and 
confidential use of the recipient(s) named above. If the reader of this message 
is not the intended recipient or an agent responsible for delivering it to the 
intended recipient, you are hereby notified that you have received this 
document in error and that any review, dissemination, distribution, or copying 
of this message is strictly prohibited. If you have received this communication 
in error, please notify us immediately, and delete the original message.