If you are curious, here are the first of the nasty msgs that started the march
toward the crash yesterday.
I really like the msgs " . . . database in evaluation mode".
02/15/15 12:45:35 ANR0106E cscmdsch.c(555): Unexpected error 4522
fetching row in table "Schedule.Pending".
02/15/15 12:45:35 ANR9999D_4073936223
CsCmdSchedulerThread(cscmdsch.c:317) Thread<88>: Invalid recovery criteria
(9999) in the central sched
uler - the task is terminating.
02/15/15 12:45:35 ANR9999D Thread<88> issued message 9999 from:
02/15/15 12:45:35 ANR9999D Thread<88> 0x000000010001ca7c StdPutText
02/15/15 12:45:35 ANR9999D Thread<88> 0x000000010001d514 OutDiagToCons
02/15/15 12:45:35 ANR9999D Thread<88> 0x00000001000090bc outDiagfExt
02/15/15 12:45:35 ANR9999D Thread<88> 0x0000000100bceb7c
CsCmdSchedulerThread
02/15/15 12:45:35 ANR9999D Thread<88> 0x000000010000c264 StartThread
02/15/15 12:45:43 ANR0171I csprompt.c(561): Error detected on 32:3,
database in evaluation mode. (SESSION: 823)
02/15/15 12:45:43 ANR2183W csprompt.c(664): Transaction 0:49249782 was
aborted. (SESSION: 823)
02/15/15 12:45:43 ANR2703E Schedule prompter aborted. (SESSION: 823)
02/15/15 12:46:12 ANR0171I tbrsql.c(2805): Error detected on 3:4,
database in evaluation mode.
02/15/15 12:46:12 ANR0171I dbitxn.c(734): Error detected on 0:3, database
in evaluation mode.
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
DeGroat, Steve
Sent: Monday, February 16, 2015 8:15 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: FW: v6.3.5 hung db2??
Thanks for these updates. We were looking to upgrade from v6.3.4 to v6.3.5
shortly, but will hold off for now. Please keep us posted with your progress.
Steve DeGroat
Sr Solution Architect for Storage
Design Services and Quality Assurance
Yale University
203.436.4540
"If you build it, they will come."
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT vm.marist DOT edu] On Behalf Of
Rhodes, Richard L.
Sent: Monday, February 16, 2015 8:01 AM
To: ADSM-L AT vm.marist DOT edu
Subject: Re: [ADSM-L] FW: v6.3.5 hung db2??
Well, I thought we had this resolved.
Yesterday (Sunday) we had another crash of this TSM instance. I've opened
another PMR. We had 2 more instances scheduled to upgraded to v6.3.5 today
that are now postponed indefinitely.
Rick
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Mitchell, Ruth Slovik
Sent: Friday, February 13, 2015 3:13 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: FW: v6.3.5 hung db2??
Rick,
Thank you for letting us know about this. It would be interesting to know if
related messages were captured in the db2diag.log when this started to manifest
itself.
Best,
Ruth
U of I, Urbana, IL
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Rhodes, Richard L.
Sent: Friday, February 13, 2015 1:38 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] FW: v6.3.5 hung db2??
Working with some good support folks!
Looks like we hit this:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www-2D01.ibm.com_support_docview.wss-3Fcrawler-3D1-26uid-3Dswg1IT06126&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=cU4lgHg-mogg3FJ7Okdd3I2i9Cl4aPnV7nm0FbEjOWY&m=jaMir-5Mj0MJ5eKdK-8UTNsNe9iNYDzZQiuTa22XCgQ&s=kcZ7t5IEvD_0y2FaFO7NGVEkbZ2fWTs2cziJ2lcxipk&e=
The v6.3.5 and v7.1.0 caused a bug in the rc.dsmserv startup script. The
result is that db2 was running on limited memory - 32MB in our case. This was
the default value in /etc/security/limits. Lvl 2 had me change
/etc/security/limits default to unlimited memory. Lvl 1 had this above APAR
and I fixed the rc.dsmserv script per the instructions.
So it looks like our problems were caused by very low db2 memory. I believe it
was restricted to 32mb!
Rick
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Rainer Tammer
Sent: Friday, February 13, 2015 11:53 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: FW: v6.3.5 hung db2??
Hello,
please keep us posted.
I will have to go from 6.3.4-300 to a higher version because of the NDMP dump >
2TB overwrite problem...
Bye
Rainer
On 13.02.2015 17:05, Rhodes, Richard L. wrote:
> Yea. I opened a Sev 1.
>
> Thanks!
>
> Rick
>
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
> Of Andrew Raibeck
> Sent: Friday, February 13, 2015 10:57 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: FW: v6.3.5 hung db2??
>
> Hi Rick,
>
> Off-hand I am not sure what the problem is, I think it would be a good
> idea to open a PMR if you have not already done so.
>
> Best regards,
>
> - Andy
>
> ______________________________________________________________________
> ______
>
> Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead |
> storman AT us.ibm DOT com
>
> IBM Tivoli Storage Manager links:
> Product support:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_suppor
> t_entry_portal_Overview_Software_Tivoli_Tivol&d=AwIFAg&c=-dg2m7zWuuDZ0
> MUcV7Sdqw&r=cU4lgHg-mogg3FJ7Okdd3I2i9Cl4aPnV7nm0FbEjOWY&m=jaMir-5Mj0MJ
> 5eKdK-8UTNsNe9iNYDzZQiuTa22XCgQ&s=0FHAerXtIarScvH_uCSwm1_6fcvnwOZkJn0a
> JVxX8lI&e=
> i_Storage_Manager
>
> Online documentation:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_suppor
> t_knowledgecenter_SSGSG7_welcome&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=c
> U4lgHg-mogg3FJ7Okdd3I2i9Cl4aPnV7nm0FbEjOWY&m=jaMir-5Mj0MJ5eKdK-8UTNsNe
> 9iNYDzZQiuTa22XCgQ&s=z6-KhygfJ8cQDURcjUIT7KQ90l7u4VzmOw8W522aB7U&e=
> Product Wiki:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ibm.com_devel
> operworks_community_wikis_home_wiki_Tivoli-2520&d=AwIFAg&c=-dg2m7zWuuD
> Z0MUcV7Sdqw&r=cU4lgHg-mogg3FJ7Okdd3I2i9Cl4aPnV7nm0FbEjOWY&m=jaMir-5Mj0
> MJ5eKdK-8UTNsNe9iNYDzZQiuTa22XCgQ&s=ap1r_YAKONXTJN1XAZO-DhocN1rgS298b0
> 4t05J9Ai4&e=
> Storage%20Manager
>
> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2015-02-13
> 10:41:55:
>
>> From: "Rhodes, Richard L." <rrhodes AT FIRSTENERGYCORP DOT COM>
>> To: ADSM-L AT VM.MARIST DOT EDU
>> Date: 2015-02-13 10:44
>> Subject: FW: v6.3.5 hung db2??
>> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>>
>> Now this is really weird.
>>
>> TSM came up after we rebooted. But it threw a bunch of ANR9999 msgs,
>> then QUIT LOGGING. It seems to be running - I go onto a server and
>> did a incr bkup, but nothing is logging in the actlog.
>>
>> 02/13/15 10:00:22 ANR9999D_2891663292 GetDomainByNodeId
>> (pmcache.c:2645) Thread<280>: Node id 626 not found in table
>> Policy.Domain.Members. (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> issued message 9999
>> from: (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010001ca7c
>> StdPutText (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010001d514
>> OutDiagToCons (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001000090bc
>> outDiagfExt (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001004bf254
>> GetDomainByNodeId (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001004beeec
>> pmOpenDomain (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001006ac78c
>> BeginVbTxn (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001006a4068
>> SmNodeSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010053ca64
>> SmSchedSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001005525d8
>> HandleNodeSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x0000000100549c54
>> DoNodeSched (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x0000000100544900
>> smExecuteSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x0000000100078a7c
>> psSessionThread (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010000c264
>> StartThread (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D_3095886799 HandleShortCircuitCodes
>> (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c
>> (10153). (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> issued message 9999
>> from: (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010001ca7c
>> StdPutText (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010001d514
>> OutDiagToCons (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001000090bc
>> outDiagfExt (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001000cbb28
>> HandleShortCircuitCodes (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001000cb0a0
>> DbiEvalSQLOutcomeX (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001000a0a18
>> TblClose (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010019b13c
>> FreeTxnDesc (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010019af14
>> dbiEndTxn (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001000458bc
>> DoEndFuncCallbacks (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x0000000100045d70
>> tmAbortX (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001004bef60
>> pmOpenDomain (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001006ac78c
>> BeginVbTxn (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001006a4068
>> SmNodeSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010053ca64
>> SmSchedSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x00000001005525d8
>> HandleNodeSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x0000000100549c54
>> DoNodeSched (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x0000000100544900
>> smExecuteSession (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x0000000100078a7c
>> psSessionThread (SESSION: 125)
>> 02/13/15 10:00:22 ANR9999D Thread<280> 0x000000010000c264
>> StartThread (SESSION: 125)
>>
>> It then threw this error and STOPPED LOGGING into actlog.
>>
>> 02/13/15 10:03:24 ANR0103E admattrm.c(806): Error 2332
>> updating row in table "Global.Attributes".
>>
>>
>>
>>
>> From: Rhodes, Richard L.
>> Sent: Friday, February 13, 2015 9:49 AM
>> To: adsm-l mailing list (ADSM-L AT VM.MARIST DOT EDU)
>> Subject: v6.3.5 hung db2??
>>
>> Two days ago we upgrade one of our TSM instances to v6.3.5 (from v6.3.4).
>> This is our first v6.3.5 instance. It runs on a AIX server.
>>
>> Last night at 19:32 it looks like DB2 went into some kind of a loop.
>> The instance became unresponsive. Dsmadmc cmds hung (didn't error,
>> just hung).
>> Dsmserv process was getting almost no cpu, while ds2sync was running
>> the
> box
>> At 65-70% but had no disk I/O. I killed dsmserv, but db2 didn't go down.
>> I tried db2stop but it did nothing. Finally rebooted to get
>> everything
> up.
>> The actlog shows no nasty errors.
>>
>> Just wondering if anyone else has had a runaway db2.
>>
>> Thanks
>>
>> Rick
>>
>>
>>
>>
>>
>>
>> -----------------------------------------
>>
>> The information contained in this message is intended only for the
>> personal and confidential use of the recipient(s) named above. If the
>> reader of this message is not the intended recipient or an agent
>> responsible for delivering it to the intended recipient, you are
>> hereby notified that you have received this document in error and
>> that any review, dissemination, distribution, or copying of this
>> message is strictly prohibited. If you have received this
>> communication in error, please notify us immediately, and delete the
>> original message.
>>
>
> -----------------------------------------
>
> The information contained in this message is intended only for the personal
> and confidential use of the recipient(s) named above. If the reader of this
> message is not the intended recipient or an agent responsible for delivering
> it to the intended recipient, you are hereby notified that you have received
> this document in error and that any review, dissemination, distribution, or
> copying of this message is strictly prohibited. If you have received this
> communication in error, please notify us immediately, and delete the original
> message.
>
>
-----------------------------------------The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If the reader of this message is not the intended
recipient or an agent responsible for delivering it to the intended recipient,
you are hereby notified that you have received this document in error and that
any review, dissemination, distribution, or copying of this message is strictly
prohibited. If you have received this communication in error, please notify us
immediately, and delete the original message.
-----------------------------------------The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If the reader of this message is not the intended
recipient or an agent responsible for delivering it to the intended recipient,
you are hereby notified that you have received this document in error and that
any review, dissemination, distribution, or copying of this message is strictly
prohibited. If you have received this communication in error, please notify us
immediately, and delete the original message.
-----------------------------------------
The information contained in this message is intended only for the personal and
confidential use of the recipient(s) named above. If the reader of this message
is not the intended recipient or an agent responsible for delivering it to the
intended recipient, you are hereby notified that you have received this
document in error and that any review, dissemination, distribution, or copying
of this message is strictly prohibited. If you have received this communication
in error, please notify us immediately, and delete the original message.
|