ADSM-L

Re: [ADSM-L] Recovering Linux TSM server from partial filesystem failure

2014-03-10 12:00:03
Subject: Re: [ADSM-L] Recovering Linux TSM server from partial filesystem failure
From: Zoltan Forray <zforray AT VCU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 10 Mar 2014 11:57:43 -0400
As soon as I know more, I will post here.  My OS guy (offsite with the box)
just reported/confirmed the root filesystem is a loss and will have to
rebuild/reinstall.  He is running Dell  hardware diagnostics right now.

Going back through what logs/reports I have available, I found that there
was some kind of hick-up on 03/06/2014,  which seems to be the start of its
downfall.

3/6/2014 1:58:51 PM ANR0106E admnode.c(23257): Unexpected error 4505
fetching row in table "Nodes".
3/6/2014 1:58:51 PM ANR9999D_2821097399 imInsertArchive(imarins.c:858)
Thread<124724>: Error 9999 setting anyV2Client=yes for nodeId=9, will
continue
3/6/2014 1:58:51 PM ANR9999D Thread<124724> issued message 9999 from:
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000dc6503 OutDiagToCons
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000dc9305 outDiagfExt
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x000000007ec0a6
imInsertArchive
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000863a41
imUpdateInventory
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x0000000088a38b imPrepareTxn
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000d64435 tmEndX
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000b8078d SmEndVbTxn
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000ba6230 SmNodeSession
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000b69c23
smExecuteSession
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000e7119d
psSessionThread
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00000000e5e01a StartThread
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00003e6be079d1 *UNKNOWN*
3/6/2014 1:58:51 PM ANR9999D Thread<124724>  0x00003e6b6e8b6d *UNKNOWN*
3/6/2014 1:58:51 PM ANR3491E No sender email address found - unable to send
email for alert, ANR9999D.
3/6/2014 1:58:51 PM ANR0157W Database operation INSERT for table
DF.Segments failed with result code 4505 and tracking ID: 0x7fff6c07dd28.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 0 is: (int32)2.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 1 is: (int32)0.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 2 is: (int64)7348.
3/6/2014 1:58:51 PM ANR0158W Database operation INSERT for table
DF.Segments failed with operation code 4505 and tracking id 0x7fff6c07dd28.
The data for column 3 is: (int32)0.
3/6/2014 1:58:51 PM ANR0102E dfcreate.c(1959): Error 4505 inserting row in
table "DF.Segments".
3/6/2014 1:58:51 PM ANR1181E dftxn.c(216): Data storage transaction
0:2952042 was aborted.
3/6/2014 1:58:51 PM ANR0532W smnode.c(4155): Transaction 0:2952042 was
aborted for session 38312 for node FIREBALL (Linux/x86_64).
3/6/2014 1:58:51 PM ANR3491E No sender email address found - unable to send
email for alert, ANR1181E.

Note, the "FIREBALL" is one of the production servers that does a
DBSNAPSHOT to this server.....

Then nothing until it tried to backup its own database later that day.

3/6/2014 8:00:11 PM ANR2971E Database backup/restore/rollforward terminated
- DB2 sqlcode -980 error.
3/6/2014 8:00:11 PM ANR1893E Process 209 for Database Backup completed with
a completion state of FAILURE.
3/6/2014 8:00:11 PM ANR3491E No sender email address found - unable to send
email for alert, ANR1893E.

Then again the following day and that was all she wrote.  Seized up later
that day/night.....

3/7/2014 8:00:10 PM ANR2971E Database backup/restore/rollforward terminated
- DB2 sqlcode -980 error.
3/7/2014 8:00:10 PM ANR1893E Process 213 for Database Backup completed with
a completion state of FAILURE.
3/7/2014 8:00:10 PM ANR3491E No sender email address found - unable to send
email for alert, ANR1893E.







On Mon, Mar 10, 2014 at 11:09 AM, Arbogast, Warren K <warbogas AT iu DOT 
edu>wrote:

> Zoltan,
> We are all eager to know if the something that happened had anything to do
> with TSM 6.3.2 or DB2. Since they seem to be fine and the OS needs to be
> rebuilt, presumably not. Sometimes i's and t's beg to dotted and crosed.
>
> Best wishes,
> Keith Arbogast
> Indiana University
>
>
> On Mar 10, 2014, at 10:55 AM, Zoltan Forray wrote:
>
> > We recently had our offsite/recover TSM server (RH Linux 6.4, TSM
> > 6.3.4.200) go south.  Something happened that caused DB2 to start
> > crashing/dumping and subsequently completely filled the filesystem
> > containing /home/tsminst1 directory.  Since this was the root folder, the
> > system tanked and is now unrecoverable.  My OS guy says the root system
> > seems to be corrupted and will probably require a complete OS reinstall.
> >
> > However, the filesystems containing the TSM DB, LOG and ARCHLOG files all
> > seem to be OK.
> >
> > Since this is an offsite, non-critical server that simply stored DB
> > Snapshots of my other production TSM servers, nuking and rebuilding is
> not
> > a big deal, mostly lots of busy-work.  This could also give me the
> > opportunity to install and play with 7.1.
> >
> > I would like to make this a "DR recovery" scenario/test.  Since the DB is
> > still there, can it be recovered from what remains, i.e. the /TSMDB,
> > /TSMLOG, /TSMARCHLOG filesystems?
> >
> > --
> > *Zoltan Forray*
> > TSM Software & Hardware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > zforray AT vcu DOT edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit http://infosecurity.vcu.edu/phishing.html
>



--
*Zoltan Forray*
TSM Software & Hardware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html