TSM Down, DB inaccessible

tkarampilas

ADSM.ORG Member
Joined
Sep 21, 2011
Messages
11
Reaction score
0
Points
0
Location
Scranton, PA, USA
Hello all,

TL;DR: My AIX Server hung due to full transaction logs, taking TSM and DB2 with it.
One of my associates deleted the transaction log files, now I can't get into either TSM or the TSM DB2 Database.

I have a mess on my hands, and am looking for any advice/guidance you can offer.
(Important note: I don't have IBM Support. My boss made me make a choice between hardware and software a couple years ago, and I figured, rightly so, that we'd have more hardware issues than software.)

Anyway, Last week, I came in, and our tape management person told me we were having issues with TSM.
So, when I began looking, I found that TSM won't start, and it appears there are database logging issues.
I suspect that the tape manager deleted the transaction logs, as the drive had filled up and was throwing errors.
Neither of us are particularly familiar with AIX or DB2, but him even less so.

TSM doesn't start with the following:

# dsmadmc
IBM Tivoli Storage Manager
Command Line Administrative Interface - Version 6, Release 1, Level 0.0
(c) Copyright by IBM Corporation and other(s) 1990, 2009. All Rights Reserved.​

Enter your user id: admin​

ANS1017E Session rejected: TCP/IP connection failure
ANS8023E Unable to establish session with server.​

ANS8002I Highest return code was -50.​

#​

I can connect to DB2, but not to the TSM DB, it gives me the following:

$ db2
(c) Copyright IBM Corporation 1993,2007
Command Line Processor for DB2 Client 9.5.5

You can issue database manager commands and SQL statements from the command
prompt. For example:
db2 => connect to sample
db2 => bind sample.bnd

For general help, type: ?.
For command help, type: ? command, where command can be
the first few keywords of a database manager command. For example:
? CATALOG DATABASE for help on the CATALOG DATABASE command
? CATALOG for help on all of the CATALOG commands.

To exit db2 interactive mode, type QUIT at the command prompt. Outside
interactive mode, all commands must be prefixed with 'db2'.
To list the current command option settings, type LIST COMMAND OPTIONS.

For more detailed help, refer to the Online Reference Manual.

db2 => connect to tsmdb1
SQL1032N No start database manager command was issued. SQLSTATE=57019
db2 => start database manager
DB20000I The START DATABASE MANAGER command completed successfully.
db2 => connect to tsmdb1
SQL1042C An unexpected system error occurred. SQLSTATE=58004
db2 =>
My db2dump.log gives me:
2015-08-17-14.11.08.767059-240 I148873019A373 LEVEL: Severe
PID : 151756 TID : 2572 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
EDUID : 2572 EDUNAME: db2loggr (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgasn, probe:4000
MESSAGE : Logging can not continue due to an error.

2015-08-17-14.11.08.767212-240 I148873393A543 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:9030
RETCODE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used"
DIA8414C Logging can not continue due to an error.

2015-08-17-14.11.08.767406-240 I148873937A543 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, data protection services, sqlpgint, probe:3600
RETCODE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used"
DIA8414C Logging can not continue due to an error.

2015-08-17-14.11.08.767631-240 I148874481A496 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #1 : Hexdump, 4 bytes
0x070000000E7EC8A0 : 8610 000D ....

2015-08-17-14.11.08.767786-240 I148874978A495 LEVEL: Error
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:120
DATA #2 : Hexdump, 4 bytes
0x070000000E7EC8A0 : 8610 000D ....

2015-08-17-14.11.08.781825-240 E148875474A965 LEVEL: Critical
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred:
"DBMarkedBad". The instance may have been shutdown as a result.
"Automatic" FODC (First Occurrence Data Capture) has been invoked and
diagnostic information has been recorded in directory
"/home/tsminst1/sqllib/db2dump/FODC_DBMarkedBad_2015-08-17-14.11.08.7
67898/". Please look in this directory for detailed evidence about
what happened and contact IBM support if necessary to diagnose the
problem.

2015-08-17-14.11.08.782237-240 E148876440A461 LEVEL: Severe
PID : 151756 TID : 1801 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-7 APPID: *LOCAL.tsminst1.150817181108
AUTHID : TSMINST1
EDUID : 1801 EDUNAME: db2agent (TSMDB1) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM7518C "TSMDB1 " marked bad.​

And my dsmerror.log is as follows:
08/17/15 11:46:29 ANS5216E Could not establish a TCP/IP connection with address '10.1.98.205:1500'. The TCP/IP error is 'A remote host refused an attempted connect operation.' (errno = 79).
08/17/15 11:46:29 ANS9020E Could not establish a session with a TSM server or client agent. The TSM return code is -50.
08/17/15 11:46:29 ANS1017E Session rejected: TCP/IP connection failure
08/17/15 11:46:29 ANS1570E Registering this instance of the Cad with the server failed. Cad process continues.
08/17/15 11:56:28 ANS5216E Could not establish a TCP/IP connection with address '10.1.98.205:1500'. The TCP/IP error is 'A remote host refused an attempted connect operation.' (errno = 79).
08/17/15 11:56:28 ANS9020E Could not establish a session with a TSM server or client agent. The TSM return code is -50.
08/17/15 11:56:28 ANS1017E Session rejected: TCP/IP connection failure
08/17/15 11:56:28 ANS8023E Unable to establish session with server.
#​

I have a 3310 Tape Library that we use for our actual tapes and rotations, there is no tape drive in the server itself. I'd have to mount the library to AIX.

I do have a db backup from before this started, but it's on a tape, not disk.

I'm not particularly concerned about the data loss at this point, I just need to get TSM back up.

As I said, any advice or guidance would be greatly appreciated.

Thanks in advance,

Ted
 
You said this is a DR server? If so, just restore the DB from the source (PROD) server. The TSM server will 'resume' from the last DB backup point.
 
I tried that earlier, with this error

$ dsmserv restore db
ANR7800I DSMSERV generated at 14:39:26 on Jan 19 2011.

Tivoli Storage Manager for AIX
Version 6, Release 1, Level 4.5

Licensed Materials - Property of IBM

(C) Copyright IBM Corporation 1990, 2009.
All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.

ANR7801I Subsystem process ID is 458858.
ANR0900I Processing options file /home/tsminst1/dsmserv.opt.
ANR7811I Using instance directory /home/tsminst1.
ANR4726I The ICC support module has been loaded.
ANR1636W The server machine GUID changed: old value (), new value (00.00.00.00-
.61.ac.11.de.bb.70.08.63.0a.01.62.cd).
ANR8200I TCP/IP Version 4 driver ready for connection with clients on port
1500.
ANR0152I Database manager successfully started.
ANR0498W Session 1 refused for COLFAX because restore DB is in progress.
ANR4636I Starting roll-forward database restore.
ANR8496E Device class 3310CLASS not defined in device configuration information
file.
 
Ok, so here's an update.

I managed to resolve the Class not defined error, as somehow I had a devconf.dat and a devconfig.dat.

The devconfig.dat was referenced in the dsmserv.opt file, but had none of my configuration data in it.
When I changed the opt file to devconf.dat, which has the correct configuration data, the Library was recognized and I made some progress.

However now I'm encountering a new issue:
ANR8312E Volume 000371L4 could not be located in library 3310_TAPE.
ANR1402W Mount request denied for volume 000371L4 - volume unavailable.
ANR4578E Database backup/restore terminated - required volume was not mounted.​

The volume is in the library, and I can move it through the library's web UI.

Any advice or guidance would be greatly appreciated.

Ted
 
Try to move tape 000371L4 manually to first tape drive (with lowest element number) and try restore again.
 
Check the volume history and find volume 000371L4 and see what element number it is in. Then you can do one of two things:
1 - update the element number of volume 000371L4 in the volume history file to match the slot it is currently in
2 - move 000371L4 in the library to the slot element currently in the volume his
 
Check the volume history and find volume 000371L4 and see what element number it is in. Then you can do one of two things:
1 - update the element number of volume 000371L4 in the volume history file to match the slot it is currently in
2 - move 000371L4 in the library to the slot element currently in the volume his

A small update regarding 1. Don't update volume history file, do not edit it !
I'm sure that marclant wanted to write :
1 - update the element number of volume 000371L4 in the devince configuration file to match the slot it is currently in
 
A small update regarding 1. Don't update volume history file, do not edit it !
I'm sure that marclant wanted to write :
1 - update the element number of volume 000371L4 in the devince configuration file to match the slot it is currently in
Thanks for the catch.
 
Smajl,

I tried putting the volume directly in the drive as suggested, and when I started the restore it actually removed the volume from the drive, put it back in the library, then told me it couldn't find the volume

ANR8772I Moving volume 000371L4 from drive DRIVE1 to slot 4101 in library
3310_TAPE.
ANR8312E Volume 000371L4 could not be located in library 3310_TAPE.
ANR1402W Mount request denied for volume 000371L4 - volume unavailable.
ANR4578E Database backup/restore terminated - required volume was not mounted.​

Erwanns/Marclant,
My Device Config file doesn't have this volume listed.
I expect you're talking about the list of volumes, commented like this:

/* LIBRARYINVENTORY SCSI 3310_TAPE 000304L4 4208 101*/
/* LIBRARYINVENTORY SCSI 3310_TAPE 000305L4 4211 101*/
/* LIBRARYINVENTORY SCSI 3310_TAPE 000306L4 4250 101*/​

Those are the last three volumes listed in the file. However, we have about 450 individual tape volumes, but our original install stopped at 306.

Am I correct in figuring that I need to add:

/* LIBRARYINVENTORY SCSI 3310_TAPE 000371L4 4101 101*/​

Also, is it necessary to add each volume between 306 and 371?

I'm going to try just adding this one volume now, and see what happens.

I'll let you know either way.

Thanks.

Ted
 
Only the volume(s) needed for the DB restore need to be in the library and in devconfig file with the elements number matching. Otherwise, TSM has no way to find the tapes.
 
So in the interest of preventing this type of missing tape issue again, I should add all my tape volumes to the devconfig file?

Is there a way to automate this?
I don't mind adding them all by hand, but we have offsite vault tapes that I may not have the element number for until they rotate back.

Also, for everyone, the restore is running now.
 
All,

I just wanted to let you know that the restore has completed and TSM is back on line.

I can't thank you all enough for your help in this matter.
 
Back
Top