Error starting TSM server after moving to a new server

c.j.hund

ADSM.ORG Senior Member
Joined
Jun 22, 2005
Messages
247
Reaction score
4
Points
0
Website
Visit site
Hi all,

One of my TSM servers was recently moved to a new LPAR. This is a TSM 5.5 server (out of support, yes), and it is running on AIX 6.1. The O/S was flashed over to the new instance, so it should have been exactly the same, and the disks housing the TSM DB were swung over to the new LPAR. The TSM server was not running when this work took place. It should be an exact duplicate of the system it came from right now. However, when trying to start the TSM server interactively, I bump into the following error:

ANR0172I rdbdb.c(1889): Error encountered performing action ActivateDatabase.
ANR0162W Supplemental database diagnostic information: -1042:SQLSTATE 58004: A
system error (that does not necessarily preclude the successful execution of
subsequent SQL statements) occurred.
:-1042 (SQL1042C An unexpected system
error occurred. SQLSTATE=58004


It looks like something with DB2 got hosed up in this process, but I'm not finding a smoking gun anywhere under the ../sqllib directory.

Any ideas on where to look for an answer would be greatly appreciated.

Sincere thanks,
C.J.
 
Apologies. Not sure why I thought this was a TSM 5.5 server. It is version 6.2. Also old, but not as old as 5.5.

C.J.
 
I am able to start things up manually:

su - tsminst1
[YOU HAVE NEW MAIL]
$ db2start
09/12/2016 15:05:37 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.

Thank you,
Chris
 
Hi all,

One of my TSM servers was recently moved to a new LPAR. This is a TSM 5.5 server (out of support, yes), and it is running on AIX 6.1. The O/S was flashed over to the new instance, so it should have been exactly the same, and the disks housing the TSM DB were swung over to the new LPAR. The TSM server was not running when this work took place. It should be an exact duplicate of the system it came from right now. However, when trying to start the TSM server interactively, I bump into the following error:

ANR0172I rdbdb.c(1889): Error encountered performing action ActivateDatabase.
ANR0162W Supplemental database diagnostic information: -1042:SQLSTATE 58004: A
system error (that does not necessarily preclude the successful execution of
subsequent SQL statements) occurred.
:-1042 (SQL1042C An unexpected system
error occurred. SQLSTATE=58004


It looks like something with DB2 got hosed up in this process, but I'm not finding a smoking gun anywhere under the ../sqllib directory.

Any ideas on where to look for an answer would be greatly appreciated.

Sincere thanks,
C.J.

Have you check the error in the db2diag log ?
 
Yes, there are some errors and critical warnings in the db2diag log. I'll share:


2016-09-13-06.33.34.300593-420 E14012667A1112 LEVEL: Error (OS)
PID : 10158266 TID : 258 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloAIXLoadModuleTryShr, probe:130
CALLED : OS, -, dlopen
OSERR : ENOEXEC (8) "Exec format error"
MESSAGE : Attempt to load specified library failed.
DATA #1 : Library name or path, 40 bytes
/home/tsminst1/sqllib/lib64/libdb2iocp.a
DATA #2 : shared library load flags, PD_TYPE_LOAD_FLAGS, 4 bytes
2
DATA #3 : String, 513 bytes
Symbol resolution failed for /home/tsminst1/sqllib/lib64/libdb2iocp.a because:
Symbol CreateIoCompletionPort (number 0) is not exported from dependent
module /unix.
Symbol GetQueuedCompletionStatus (number 1) is not exported from dependent
module /unix.
Symbol GetMultipleCompletionStatus (number 2) is not exported from dependent
module /unix.
Could not load module /home/tsminst1/sqllib/lib64/libdb2iocp.a.
System error: Exec format error
Examine .loader section symbols with the 'dump -Tv' command.

2016-09-13-06.33.34.301117-420 E14013780A854 LEVEL: Error (OS)
PID : 10158266 TID : 258 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloAIXLoadModuleTryShr, probe:140
CALLED : OS, -, dlopen
OSERR : ENOEXEC (8) "Exec format error"
MESSAGE : Attempt to load specified library augmented with object name failed.
DATA #1 : Library name or path, 50 bytes
/home/tsminst1/sqllib/lib64/libdb2iocp.a(shr_64.o)
DATA #2 : shared library load flags, PD_TYPE_LOAD_FLAGS, 4 bytes
262146
DATA #3 : String, 213 bytes
Could not load module /home/tsminst1/sqllib/lib64/libdb2iocp.a(shr_64.o).
File /home/tsminst1/sqllib/lib64/libdb2iocp.a is not an
archive or the file could not be read properly.
System error: Exec format error

2016-09-13-06.33.34.301455-420 I14014635A568 LEVEL: Severe
PID : 10158266 TID : 258 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloLioInitIocp, probe:200
CALLED : DB2 UDB, oper system services, sqloLoadModule
RETCODE : ZRC=0x870F009B=-2029059941=SQLO_MOD_LOAD_FAILED
"Dynamic library load failed."
DATA #1 : Library name or path, 12 bytes
libdb2iocp.a
DATA #2 : Library Search Path, 27 bytes
/home/tsminst1/sqllib/lib64

2016-09-13-06.33.34.301869-420 E14015204A406 LEVEL: Warning
PID : 10158266 TID : 258 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloStartAIOCollectorEDUs, probe:30
MESSAGE : ADM0513W db2start succeeded. However, no I/O completion port (IOCP)
is available.

2016-09-13-06.33.35.591487-420 I14016287A546 LEVEL: Severe
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:450
RETCODE : ZRC=0x8710001D=-2028994531=SQLP_LERR "Fatal Logic Error"
DIA8526C A fatal error occurred in data protection services.

2016-09-13-06.33.35.592006-420 I14016834A498 LEVEL: Severe
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #1 : Hexdump, 4 bytes
0x070000000C7EED64 : 8710 001D ....

2016-09-13-06.33.35.592314-420 I14017333A497 LEVEL: Error
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #2 : Hexdump, 4 bytes
0x070000000C7EED64 : 8710 001D ....

2016-09-13-06.33.35.599356-420 E14017831A972 LEVEL: Critical
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred:
"DBMarkedBad". The instance may have been shutdown as a result.
"Automatic" FODC (First Occurrence Data Capture) has been invoked and
diagnostic information has been recorded in directory
"/home/tsminst1/sqllib/db2dump/FODC_DBMarkedBad_2016-09-13-06.33.35.5
92629_0000/". Please look in this directory for detailed evidence
about what happened and contact IBM support if necessary to diagnose
the problem.

2016-09-13-06.33.35.600076-420 E14018804A463 LEVEL: Severe
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM7518C "TSMDB1 " marked bad.

2016-09-13-06.33.35.600738-420 I14019268A476 LEVEL: Severe
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:210
MESSAGE : Database logging stopped due to mark db bad.

2016-09-13-06.33.35.621713-420 I14019745A496 LEVEL: Severe
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, DRDA Application Server, sqljsSignalHandler, probe:10
MESSAGE : DIA0505I Execution of a component signal handling function has begun.

2016-09-13-06.33.35.622811-420 I14020242A509 LEVEL: Severe
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, DRDA Application Server, sqljsSignalHandler, probe:20
MESSAGE : DIA0506I Execution of a component signal handling function is
complete.

2016-09-13-06.33.35.623895-420 I14021378A520 LEVEL: Severe
PID : 10158266 TID : 3857 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.33435.160913133335
AUTHID : TSMINST1
EDUID : 3857 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FirstConnect, probe:125
DATA #1 : Hexdump, 4 bytes
0x0780000000B2BC3C : FFFF FBEE
 
Yes, there are some errors and critical warnings in the db2diag log. I'll share:


2016-09-13-06.33.34.300593-420 E14012667A1112 LEVEL: Error (OS)
PID : 10158266 TID : 258 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000
EDUID : 258 EDUNAME: db2sysc 0
FUNCTION: DB2 UDB, oper system services, sqloAIXLoadModuleTryShr, probe:130
CALLED : OS, -, dlopen
OSERR : ENOEXEC (8) "Exec format error"
MESSAGE : Attempt to load specified library failed.
DATA #1 : Library name or path, 40 bytes
/home/tsminst1/sqllib/lib64/libdb2iocp.a
DATA #2 : shared library load flags, PD_TYPE_LOAD_FLAGS, 4 bytes
2
...
Could not load module /home/tsminst1/sqllib/lib64/libdb2iocp.a.
System error: Exec format error
Examine .loader section symbols with the 'dump -Tv' command.


have a look at http://www-01.ibm.com/support/docview.wss?uid=swg21430458
 
Yeah, I found that link just a moment ago, too ... and it does appear that the iocp ports were set to "defined" instead of available. I set the iocp ports to "available" in smitty, rebooted, and attempted to start the DB again. The iocp errors have disappeared from the db2diag.log file, but the DB is still not starting. The errors now look a little worse:



2016-09-13-07.17.19.784774-420 I14029184A927 LEVEL: Error
PID : 6160600 TID : 1 PROC : db2fm
INSTANCE: tsminst1 NODE : 000
EDUID : 1
FUNCTION: DB2 Common, Generic Control Facility, gcf_stop, probe:60
MESSAGE : ECF=0x90000390=-1879047280=ECF_FM_INVALID_PROCESS_ID
Invalid process id

CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x09000000028CD874 pdOSSeLoggingCallback + 0x34
[1] 0x09000000008E06C4 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x09000000008E0468 ossLog + 0x88
[3] 0x0900000004AA2214 gcf_stop + 0x754
[4] 0x0900000000CD504C stop__9GcfCallerFP12GCF_PartInfoUlP11GCF_RetInfo + 0x1CC
[5] 0x0000000100003B58 main + 0x25F8
[6] 0x0000000100000290 __start + 0x98
[7] 0x0000000000000000 ?unknown + 0x0
[8] 0x0000000000000000 ?unknown + 0x0
[9] 0x0000000000000000 ?unknown + 0x0

2016-09-13-07.17.19.824993-420 I14030112A1034 LEVEL: Error
PID : 6160600 TID : 1 PROC : db2fm
INSTANCE: tsminst1 NODE : 000
EDUID : 1
FUNCTION: DB2 Common, Fault Monitor Facility, db2fm, probe:170
MESSAGE : ECF=0x90000349=-1879047351=ECF_FM_FAIL_TO_STOP_GCF_FM
Failed to stop the GCF fm module
CALLED : DB2 Common, Generic Control Facility, GcfCaller::stop
DATA #1 : signed integer, 8 bytes
1
DATA #2 : unsigned integer, 8 bytes
0
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x09000000028CD874 pdOSSeLoggingCallback + 0x34
[1] 0x09000000008E06C4 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x09000000008E0A80 ossLogRC + 0xA0
[3] 0x0000000100003BDC main + 0x267C
[4] 0x0000000100000290 __start + 0x98
[5] 0x0000000000000000 ?unknown + 0x0
[6] 0x0000000000000000 ?unknown + 0x0
[7] 0x0000000000000000 ?unknown + 0x0
[8] 0x0000000000000000 ?unknown + 0x0
[9] 0x0000000000000000 ?unknown + 0x0

2016-09-13-07.20.12.470432-420 I14039062A546 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:450
RETCODE : ZRC=0x8710001D=-2028994531=SQLP_LERR "Fatal Logic Error"
DIA8526C A fatal error occurred in data protection services.

2016-09-13-07.20.12.487496-420 I14039609A498 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #1 : Hexdump, 4 bytes
0x070000000CBEED64 : 8710 001D ....

2016-09-13-07.20.12.488934-420 I14040108A497 LEVEL: Error
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #2 : Hexdump, 4 bytes
0x070000000CBEED64 : 8710 001D ....

2016-09-13-07.20.12.494644-420 E14040606A972 LEVEL: Critical
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred:
"DBMarkedBad". The instance may have been shutdown as a result.
"Automatic" FODC (First Occurrence Data Capture) has been invoked and
diagnostic information has been recorded in directory
"/home/tsminst1/sqllib/db2dump/FODC_DBMarkedBad_2016-09-13-07.20.12.4
91612_0000/". Please look in this directory for detailed evidence
about what happened and contact IBM support if necessary to diagnose
the problem.

2016-09-13-07.20.12.495479-420 E14041579A463 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM7518C "TSMDB1 " marked bad.

2016-09-13-07.20.12.496168-420 I14042043A476 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:210
MESSAGE : Database logging stopped due to mark db bad.

2016-09-13-07.20.12.529146-420 I14042520A496 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, DRDA Application Server, sqljsSignalHandler, probe:10
MESSAGE : DIA0505I Execution of a component signal handling function has begun.

2016-09-13-07.20.12.532866-420 I14043017A509 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, DRDA Application Server, sqljsSignalHandler, probe:20
MESSAGE : DIA0506I Execution of a component signal handling function is
complete.

2016-09-13-07.20.12.536264-420 I14044153A520 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FirstConnect, probe:125
DATA #1 : Hexdump, 4 bytes
0x0780000000B2BC3C : FFFF FBEE
 
Although those first couple of errors look bad, it appears to me that what's stopping the DB from starting is this error:

2016-09-13-07.20.12.470432-420 I14039062A546 LEVEL: Severe
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:450
RETCODE : ZRC=0x8710001D=-2028994531=SQLP_LERR "Fatal Logic Error"
DIA8526C A fatal error occurred in data protection services.

That issue is covered in this link:

http://www-01.ibm.com/support/docview.wss?uid=swg1IT00608

The crux of which states:


To hit this issue following conditions must be true:

1. database configured for infinite logging (LOGSECOND -1)

2. slow or not working log file archiving

3. force down the database / instance

4. attempt database restart while log archiving is still not
working correctly


I am willing to be the issue is #2.
 
My two cents, I don't trust image copies.

If I were to move from one server to another, I will install the TSM Server from scratch and do a DB restore.
 
I agree. This is not how I would have preferred to move the DB to a new server instance.

$ db2start
09/13/2016 08:32:16 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
$ db2 connect to tsmdb1
SQL1042C An unexpected system error occurred. SQLSTATE=58004
 
2016-09-13-07.20.12.494644-420 E14040606A972 LEVEL: Critical
PID : 8257734 TID : 3600 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: 127.0.0.1.32796.160913142012
AUTHID : TSMINST1
EDUID : 3600 EDUNAME: db2agent (TSMDB1_L) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred:
"DBMarkedBad". The instance may have been shutdown as a result.
"Automatic" FODC (First Occurrence Data Capture) has been invoked and
diagnostic information has been recorded in directory
"/home/tsminst1/sqllib/db2dump/FODC_DBMarkedBad_2016-09-13-07.20.12.4
91612_0000/". Please look in this directory for detailed evidence
about what happened and contact IBM support if necessary to diagnose
the problem.
There's your issue. Follow the instructions at the end of the message.
 
Happy to report this issue has been resolved. It turns out that when the O/S was flashed and set up on a new LPAR, DB2 was still running. The TSM server was down, but DB2 was still running and that was causing all these problems. So, attempting the entire process again with DB2 down eventually resulted in the TSM server coming up on the new system, but it took the work of a very experience DB2 admin to make that happen. There were still several "gotchas" involved.

If you're going to move to a new TSM server, it's always going to be easier to reinstall TSM on the new server and do a traditional TSM DB restore.

Thanks,
C.J.
 
Back
Top