ADSM-L

Re: [ADSM-L] FW: v6.3.5 hung db2??

2015-02-16 08:01:51
Subject: Re: [ADSM-L] FW: v6.3.5 hung db2??
From: "Rhodes, Richard L." <rrhodes AT FIRSTENERGYCORP DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 16 Feb 2015 13:00:51 +0000
Well, I thought we had this resolved.

Yesterday (Sunday) we had another crash of this TSM instance.  I've opened 
another PMR.  We had 2 more instances scheduled to upgraded to v6.3.5 today 
that are now postponed indefinitely.


Rick





-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Mitchell, Ruth Slovik
Sent: Friday, February 13, 2015 3:13 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: FW: v6.3.5 hung db2??

Rick,

Thank you for letting us know about this. It would be interesting to know if 
related messages were captured in the db2diag.log when this started to manifest 
itself.

Best,

Ruth
U of I, Urbana, IL

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Rhodes, Richard L.
Sent: Friday, February 13, 2015 1:38 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] FW: v6.3.5 hung db2??

Working with some good support folks!   

Looks like we hit this: 
http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1IT06126

The v6.3.5 and v7.1.0 caused a bug in the rc.dsmserv startup script.  The 
result is that db2 was running on limited memory - 32MB in our case.  This was 
the default value in /etc/security/limits.  Lvl 2 had me change 
/etc/security/limits default to unlimited memory.  Lvl 1 had this above APAR 
and I fixed the rc.dsmserv script per the instructions.  

So it looks like our problems were caused by very low db2 memory.  I believe it 
was restricted to 32mb!  
  

Rick




-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Rainer Tammer
Sent: Friday, February 13, 2015 11:53 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: FW: v6.3.5 hung db2??

Hello,
please keep us posted.

I will have to go from 6.3.4-300 to a higher version because of the NDMP dump > 
2TB overwrite problem...

Bye
  Rainer

On 13.02.2015 17:05, Rhodes, Richard L. wrote:
> Yea.  I opened a Sev 1.
>
> Thanks!
>
> Rick
>
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of Andrew Raibeck
> Sent: Friday, February 13, 2015 10:57 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: FW: v6.3.5 hung db2??
>
> Hi Rick,
>
> Off-hand I am not sure what the problem is, I think it would be a good 
> idea to open a PMR if you have not already done so.
>
> Best regards,
>
> - Andy
>
> ______________________________________________________________________
> ______
>
> Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead | 
> storman AT us.ibm DOT com
>
> IBM Tivoli Storage Manager links:
> Product support:
> http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivol
> i_Storage_Manager
>
> Online documentation:
> http://www.ibm.com/support/knowledgecenter/SSGSG7/welcome
> Product Wiki:
> https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20
> Storage%20Manager
>
> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2015-02-13
> 10:41:55:
>
>> From: "Rhodes, Richard L." <rrhodes AT FIRSTENERGYCORP DOT COM>
>> To: ADSM-L AT VM.MARIST DOT EDU
>> Date: 2015-02-13 10:44
>> Subject: FW: v6.3.5 hung db2??
>> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>>
>> Now this is really weird.
>>
>> TSM came up after we rebooted.  But it threw a bunch of ANR9999 msgs, 
>> then QUIT LOGGING.  It seems to be running - I go onto a server and 
>> did a incr bkup, but nothing is logging in the actlog.
>>
>> 02/13/15   10:00:22     ANR9999D_2891663292 GetDomainByNodeId
>> (pmcache.c:2645) Thread<280>: Node id 626 not found in table 
>> Policy.Domain.Members. (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280> issued message 9999
>> from: (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010001ca7c
>> StdPutText  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010001d514
>> OutDiagToCons  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001000090bc
>> outDiagfExt  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001004bf254
>> GetDomainByNodeId  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001004beeec
>> pmOpenDomain  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001006ac78c
>> BeginVbTxn  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001006a4068
>> SmNodeSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010053ca64
>> SmSchedSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001005525d8
>> HandleNodeSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x0000000100549c54
>> DoNodeSched  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x0000000100544900
>> smExecuteSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x0000000100078a7c
>> psSessionThread  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010000c264
>> StartThread  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D_3095886799 HandleShortCircuitCodes
>> (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c 
>> (10153). (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280> issued message 9999
>> from: (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010001ca7c
>> StdPutText  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010001d514
>> OutDiagToCons  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001000090bc
>> outDiagfExt  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001000cbb28
>> HandleShortCircuitCodes  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001000cb0a0
>> DbiEvalSQLOutcomeX  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001000a0a18
>> TblClose  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010019b13c
>> FreeTxnDesc  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010019af14
>> dbiEndTxn  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001000458bc
>> DoEndFuncCallbacks  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x0000000100045d70
>> tmAbortX  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001004bef60
>> pmOpenDomain  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001006ac78c
>> BeginVbTxn  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001006a4068
>> SmNodeSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010053ca64
>> SmSchedSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x00000001005525d8
>> HandleNodeSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x0000000100549c54
>> DoNodeSched  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x0000000100544900
>> smExecuteSession  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x0000000100078a7c
>> psSessionThread  (SESSION: 125)
>> 02/13/15   10:00:22     ANR9999D Thread<280>  0x000000010000c264
>> StartThread  (SESSION: 125)
>>
>> It then threw this error and STOPPED LOGGING into actlog.
>>
>> 02/13/15   10:03:24     ANR0103E admattrm.c(806): Error 2332
>> updating row in table "Global.Attributes".
>>
>>
>>
>>
>> From: Rhodes, Richard L.
>> Sent: Friday, February 13, 2015 9:49 AM
>> To: adsm-l mailing list (ADSM-L AT VM.MARIST DOT EDU)
>> Subject: v6.3.5 hung db2??
>>
>> Two days ago we upgrade one of our TSM instances to v6.3.5 (from v6.3.4).
>> This is our first v6.3.5 instance.   It runs on a AIX server.
>>
>> Last night at 19:32 it looks like DB2 went into some kind of a loop.
>> The instance became unresponsive.  Dsmadmc cmds hung (didn't error, 
>> just hung).
>> Dsmserv process was getting almost no cpu, while ds2sync was running 
>> the
> box
>> At 65-70% but had no disk I/O.  I killed dsmserv, but db2 didn't go down.
>> I tried db2stop but it did nothing.  Finally rebooted to get 
>> everything
> up.
>> The actlog shows no nasty errors.
>>
>> Just wondering if anyone else has had a runaway db2.
>>
>> Thanks
>>
>> Rick
>>
>>
>>
>>
>>
>>
>> -----------------------------------------
>>
>> The information contained in this message is intended only for the 
>> personal and confidential use of the recipient(s) named above. If the 
>> reader of this message is not the intended recipient or an agent 
>> responsible for delivering it to the intended recipient, you are 
>> hereby notified that you have received this document in error and 
>> that any review, dissemination, distribution, or copying of this 
>> message is strictly prohibited. If you have received this 
>> communication in error, please notify us immediately, and delete the 
>> original message.
>>
>
> -----------------------------------------
>
> The information contained in this message is intended only for the personal 
> and confidential use of the recipient(s) named above. If the reader of this 
> message is not the intended recipient or an agent responsible for delivering 
> it to the intended recipient, you are hereby notified that you have received 
> this document in error and that any review, dissemination, distribution, or 
> copying of this message is strictly prohibited. If you have received this 
> communication in error, please notify us immediately, and delete the original 
> message.
>
>


-----------------------------------------The information contained in this 
message is intended only for the personal and confidential use of the 
recipient(s) named above. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that you have received this document in error and that 
any review, dissemination, distribution, or copying of this message is strictly 
prohibited. If you have received this communication in error, please notify us 
immediately, and delete the original message.


-----------------------------------------
The information contained in this message is intended only for the personal and 
confidential use of the recipient(s) named above. If the reader of this message 
is not the intended recipient or an agent responsible for delivering it to the 
intended recipient, you are hereby notified that you have received this 
document in error and that any review, dissemination, distribution, or copying 
of this message is strictly prohibited. If you have received this communication 
in error, please notify us immediately, and delete the original message.

<Prev in Thread] Current Thread [Next in Thread>