ADSM-L

Re: [ADSM-L] Recovering Linux TSM server from partial filesystem failure

2014-03-11 12:21:18
Subject: Re: [ADSM-L] Recovering Linux TSM server from partial filesystem failure
From: Chavdar Cholev <chavdar.cholev AT GMAIL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 11 Mar 2014 18:18:45 +0200
Zoltan,
you can check db2diag log fir more info, I would start from there.
I am nor sure about TSM6.1 to TSM7.1, my main concern here is different
DB2 versions.
If you thing to rebuild TSM 6.1 on new HW by mounting LUNs from "old
crashed"  TSM it may work, and
after that to upgrade to TSM 7.1  I think it is possible.
On 3/11/2014 17:17, Zoltan Forray wrote:
With the lack of replies, I am guessing I can't recover this server from
what is left behind.  I do have an old DB backups but for what this server
does, it isn't worth bothering.  I can rebuild it faster.

I do have additional questions that somebody might have an answer to.

1.  Any reason NOT to install 7.1 on this box?  My only hesitation is my
last 6.1 server (being upgraded to 6.3.4 in 2-weeks), has to communicate
with it, to perform DBSNAPSHOT backups?

2.  When doing postmortem on this failed server (still waiting for results
from hardware diagnostics - my OS guy is head to the offsite location to
check on the results and to start reinstalling the OS), I notice this
message from my monitoring system:

3/6/2014 8:00:11 PM ANR2971E Database backup/restore/rollforward terminated
- DB2 sqlcode -980 error.

Unfortunately, everywhere I Google sqlcode's, there is no *-980* ?  Anybody
have a better magic decoder ring to tell me what this is saying?


On Mon, Mar 10, 2014 at 11:57 AM, Zoltan Forray <zforray AT vcu DOT edu> wrote:

As soon as I know more, I will post here.  My OS guy (offsite with the
box) just reported/confirmed the root filesystem is a loss and will have to
rebuild/reinstall.  He is running Dell  hardware diagnostics right now.

Going back through what logs/reports I have available, I found that there
was some kind of hick-up on 03/06/2014,  which seems to be the start of its
downfall.

3/6/2014 1:58:51 PM ANR0106E admnode.c(23257): Unexpected error 4505
fetching row in table "Nodes".
3/6/2014 1:58:51 PM ANR9999D_2821097399 imInsertArchive(imarins.c:858)
Thread<124724>: Error 9999 setting anyV2Client=yes for nodeId=9, will
continue
3/6/2014 1:58:51 PM ANR9999D Thread<124724> issued message 9999 from:
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000dc6503
OutDiagToCons
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000dc9305 outDiagfExt
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x000000007ec0a6
imInsertArchive
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000863a41
imUpdateInventory
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x0000000088a38b imPrepareTxn
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000d64435 tmEndX
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000b8078d SmEndVbTxn
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000ba6230
SmNodeSession
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000b69c23
smExecuteSession
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000e7119d
psSessionThread
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000e5e01a StartThread
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00003e6be079d1 *UNKNOWN*
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00003e6b6e8b6d *UNKNOWN*
3/6/2014 1:58:51 PM ANR3491E No sender email address found - unable to
send email for alert, ANR9999D.
3/6/2014 1:58:51 PM ANR0157W Database operation INSERT for table
DF.Segments failed with result code 4505 and tracking ID: 0x7fff6c07dd28.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 0 is: (int32)2.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 1 is: (int32)0.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 2 is: (int64)7348.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 3 is: (int32)0.
3/6/2014 1:58:51 PM ANR0102E dfcreate.c(1959): Error 4505 inserting row in
table "DF.Segments".
3/6/2014 1:58:51 PM ANR1181E dftxn.c(216): Data storage transaction
0:2952042 was aborted.
3/6/2014 1:58:51 PM ANR0532W smnode.c(4155): Transaction 0:2952042 was
aborted for session 38312 for node FIREBALL (Linux/x86_64).
3/6/2014 1:58:51 PM ANR3491E No sender email address found - unable to
send email for alert, ANR1181E.

Note, the "FIREBALL" is one of the production servers that does a
DBSNAPSHOT to this server.....

Then nothing until it tried to backup its own database later that day.

3/6/2014 8:00:11 PM ANR2971E Database backup/restore/rollforward
terminated - DB2 sqlcode -980 error.
3/6/2014 8:00:11 PM ANR1893E Process 209 for Database Backup completed
with a completion state of FAILURE.
3/6/2014 8:00:11 PM ANR3491E No sender email address found - unable to
send email for alert, ANR1893E.

Then again the following day and that was all she wrote.  Seized up later
that day/night.....

3/7/2014 8:00:10 PM ANR2971E Database backup/restore/rollforward
terminated - DB2 sqlcode -980 error.
3/7/2014 8:00:10 PM ANR1893E Process 213 for Database Backup completed
with a completion state of FAILURE.
3/7/2014 8:00:10 PM ANR3491E No sender email address found - unable to
send email for alert, ANR1893E.







On Mon, Mar 10, 2014 at 11:09 AM, Arbogast, Warren K <warbogas AT iu DOT 
edu>wrote:

Zoltan,
We are all eager to know if the something that happened had anything to
do with TSM 6.3.2 or DB2. Since they seem to be fine and the OS needs to be
rebuilt, presumably not. Sometimes i's and t's beg to dotted and crosed.

Best wishes,
Keith Arbogast
Indiana University


On Mar 10, 2014, at 10:55 AM, Zoltan Forray wrote:

We recently had our offsite/recover TSM server (RH Linux 6.4, TSM
6.3.4.200) go south.  Something happened that caused DB2 to start
crashing/dumping and subsequently completely filled the filesystem
containing /home/tsminst1 directory.  Since this was the root folder,
the
system tanked and is now unrecoverable.  My OS guy says the root system
seems to be corrupted and will probably require a complete OS reinstall.

However, the filesystems containing the TSM DB, LOG and ARCHLOG files
all
seem to be OK.

Since this is an offsite, non-critical server that simply stored DB
Snapshots of my other production TSM servers, nuking and rebuilding is
not
a big deal, mostly lots of busy-work.  This could also give me the
opportunity to install and play with 7.1.

I would like to make this a "DR recovery" scenario/test.  Since the DB
is
still there, can it be recovered from what remains, i.e. the /TSMDB,
/TSMLOG, /TSMARCHLOG filesystems?

--
*Zoltan Forray*
TSM Software & Hardware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html


--
*Zoltan Forray*
TSM Software & Hardware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html



--
*Zoltan Forray*
TSM Software & Hardware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html