tkarampilas
ADSM.ORG Member
Hello all,
TL;DR: My AIX Server hung due to full transaction logs, taking TSM and DB2 with it.
One of my associates deleted the transaction log files, now I can't get into either TSM or the TSM DB2 Database.
I have a mess on my hands, and am looking for any advice/guidance you can offer.
(Important note: I don't have IBM Support. My boss made me make a choice between hardware and software a couple years ago, and I figured, rightly so, that we'd have more hardware issues than software.)
Anyway, Last week, I came in, and our tape management person told me we were having issues with TSM.
So, when I began looking, I found that TSM won't start, and it appears there are database logging issues.
I suspect that the tape manager deleted the transaction logs, as the drive had filled up and was throwing errors.
Neither of us are particularly familiar with AIX or DB2, but him even less so.
TSM doesn't start with the following:
I can connect to DB2, but not to the TSM DB, it gives me the following:
And my dsmerror.log is as follows:
I have a 3310 Tape Library that we use for our actual tapes and rotations, there is no tape drive in the server itself. I'd have to mount the library to AIX.
I do have a db backup from before this started, but it's on a tape, not disk.
I'm not particularly concerned about the data loss at this point, I just need to get TSM back up.
As I said, any advice or guidance would be greatly appreciated.
Thanks in advance,
Ted
TL;DR: My AIX Server hung due to full transaction logs, taking TSM and DB2 with it.
One of my associates deleted the transaction log files, now I can't get into either TSM or the TSM DB2 Database.
I have a mess on my hands, and am looking for any advice/guidance you can offer.
(Important note: I don't have IBM Support. My boss made me make a choice between hardware and software a couple years ago, and I figured, rightly so, that we'd have more hardware issues than software.)
Anyway, Last week, I came in, and our tape management person told me we were having issues with TSM.
So, when I began looking, I found that TSM won't start, and it appears there are database logging issues.
I suspect that the tape manager deleted the transaction logs, as the drive had filled up and was throwing errors.
Neither of us are particularly familiar with AIX or DB2, but him even less so.
TSM doesn't start with the following:
# dsmadmc
IBM Tivoli Storage Manager
Command Line Administrative Interface - Version 6, Release 1, Level 0.0
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.
IBM Tivoli Storage Manager
Command Line Administrative Interface - Version 6, Release 1, Level 0.0
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.
Enter your user id: admin
ANS1017E Session rejected: TCP/IP connection failure
ANS8023E Unable to establish session with server.
ANS8023E Unable to establish session with server.
ANS8002I Highest return code was -50.
#
I can connect to DB2, but not to the TSM DB, it gives me the following:
$ db2
(c) Copyright IBM Corporation 1993,2007
Command Line Processor for DB2 Client 9.5.5
You can issue database manager commands and SQL statements from the command
prompt. For example:
db2 => connect to sample
db2 => bind sample.bnd
For general help, type: ?.
For command help, type: ? command, where command can be
the first few keywords of a database manager command. For example:
? CATALOG DATABASE for help on the CATALOG DATABASE command
? CATALOG for help on all of the CATALOG commands.
To exit db2 interactive mode, type QUIT at the command prompt. Outside
interactive mode, all commands must be prefixed with 'db2'.
To list the current command option settings, type LIST COMMAND OPTIONS.
For more detailed help, refer to the Online Reference Manual.
db2 => connect to tsmdb1
SQL1032N No start database manager command was issued. SQLSTATE=57019
db2 => start database manager
DB20000I The START DATABASE MANAGER command completed successfully.
db2 => connect to tsmdb1
SQL1042C An unexpected system error occurred. SQLSTATE=58004
db2 =>
My db2dump.log gives me:(c) Copyright IBM Corporation 1993,2007
Command Line Processor for DB2 Client 9.5.5
You can issue database manager commands and SQL statements from the command
prompt. For example:
db2 => connect to sample
db2 => bind sample.bnd
For general help, type: ?.
For command help, type: ? command, where command can be
the first few keywords of a database manager command. For example:
? CATALOG DATABASE for help on the CATALOG DATABASE command
? CATALOG for help on all of the CATALOG commands.
To exit db2 interactive mode, type QUIT at the command prompt. Outside
interactive mode, all commands must be prefixed with 'db2'.
To list the current command option settings, type LIST COMMAND OPTIONS.
For more detailed help, refer to the Online Reference Manual.
db2 => connect to tsmdb1
SQL1032N No start database manager command was issued. SQLSTATE=57019
db2 => start database manager
DB20000I The START DATABASE MANAGER command completed successfully.
db2 => connect to tsmdb1
SQL1042C An unexpected system error occurred. SQLSTATE=58004
db2 =>
2015-08-17-14.11.08.767059-240 I148873019A373 LEVEL: Severe
PID : 151756 TID : 2572 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
EDUID : 2572 EDUNAME: db2loggr (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgasn, probe:4000
MESSAGE : Logging can not continue due to an error.
2015-08-17-14.11.08.767212-240 I148873393A543 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:9030
RETCODE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used"
DIA8414C Logging can not continue due to an error.
2015-08-17-14.11.08.767406-240 I148873937A543 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:3600
RETCODE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used"
DIA8414C Logging can not continue due to an error.
2015-08-17-14.11.08.767631-240 I148874481A496 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #1 : Hexdump, 4 bytes
0x070000000E7EC8A0 : 8610 000D ....
2015-08-17-14.11.08.767786-240 I148874978A495 LEVEL: Error
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #2 : Hexdump, 4 bytes
0x070000000E7EC8A0 : 8610 000D ....
2015-08-17-14.11.08.781825-240 E148875474A965 LEVEL: Critical
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred:
"DBMarkedBad". The instance may have been shutdown as a result.
"Automatic" FODC (First Occurrence Data Capture) has been invoked and
diagnostic information has been recorded in directory
"/home/tsminst1/sqllib/db2dump/FODC_DBMarkedBad_2015-08-17-14.11.08.7
67898/". Please look in this directory for detailed evidence about
what happened and contact IBM support if necessary to diagnose the
problem.
2015-08-17-14.11.08.782237-240 E148876440A461 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM7518C "TSMDB1 " marked bad.
PID : 151756 TID : 2572 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
EDUID : 2572 EDUNAME: db2loggr (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgasn, probe:4000
MESSAGE : Logging can not continue due to an error.
2015-08-17-14.11.08.767212-240 I148873393A543 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:9030
RETCODE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used"
DIA8414C Logging can not continue due to an error.
2015-08-17-14.11.08.767406-240 I148873937A543 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:3600
RETCODE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used"
DIA8414C Logging can not continue due to an error.
2015-08-17-14.11.08.767631-240 I148874481A496 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #1 : Hexdump, 4 bytes
0x070000000E7EC8A0 : 8610 000D ....
2015-08-17-14.11.08.767786-240 I148874978A495 LEVEL: Error
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #2 : Hexdump, 4 bytes
0x070000000E7EC8A0 : 8610 000D ....
2015-08-17-14.11.08.781825-240 E148875474A965 LEVEL: Critical
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred:
"DBMarkedBad". The instance may have been shutdown as a result.
"Automatic" FODC (First Occurrence Data Capture) has been invoked and
diagnostic information has been recorded in directory
"/home/tsminst1/sqllib/db2dump/FODC_DBMarkedBad_2015-08-17-14.11.08.7
67898/". Please look in this directory for detailed evidence about
what happened and contact IBM support if necessary to diagnose the
problem.
2015-08-17-14.11.08.782237-240 E148876440A461 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM7518C "TSMDB1 " marked bad.
And my dsmerror.log is as follows:
08/17/15 11:46:29 ANS5216E Could not establish a TCP/IP connection with address '10.1.98.205:1500'. The TCP/IP error is 'A remote host refused an attempted connect operation.' (errno = 79).
08/17/15 11:46:29 ANS9020E Could not establish a session with a TSM server or client agent. The TSM return code is -50.
08/17/15 11:46:29 ANS1017E Session rejected: TCP/IP connection failure
08/17/15 11:46:29 ANS1570E Registering this instance of the Cad with the server failed. Cad process continues.
08/17/15 11:56:28 ANS5216E Could not establish a TCP/IP connection with address '10.1.98.205:1500'. The TCP/IP error is 'A remote host refused an attempted connect operation.' (errno = 79).
08/17/15 11:56:28 ANS9020E Could not establish a session with a TSM server or client agent. The TSM return code is -50.
08/17/15 11:56:28 ANS1017E Session rejected: TCP/IP connection failure
08/17/15 11:56:28 ANS8023E Unable to establish session with server.
#
08/17/15 11:46:29 ANS9020E Could not establish a session with a TSM server or client agent. The TSM return code is -50.
08/17/15 11:46:29 ANS1017E Session rejected: TCP/IP connection failure
08/17/15 11:46:29 ANS1570E Registering this instance of the Cad with the server failed. Cad process continues.
08/17/15 11:56:28 ANS5216E Could not establish a TCP/IP connection with address '10.1.98.205:1500'. The TCP/IP error is 'A remote host refused an attempted connect operation.' (errno = 79).
08/17/15 11:56:28 ANS9020E Could not establish a session with a TSM server or client agent. The TSM return code is -50.
08/17/15 11:56:28 ANS1017E Session rejected: TCP/IP connection failure
08/17/15 11:56:28 ANS8023E Unable to establish session with server.
#
I have a 3310 Tape Library that we use for our actual tapes and rotations, there is no tape drive in the server itself. I'd have to mount the library to AIX.
I do have a db backup from before this started, but it's on a tape, not disk.
I'm not particularly concerned about the data loss at this point, I just need to get TSM back up.
As I said, any advice or guidance would be greatly appreciated.
Thanks in advance,
Ted