Hi all,
This morning i noticed that our TSM failed to backup one of our nodes. A
look at the activity log revealed following errors:
ANR9999D imgroup.c(1180): ThreadId<52> Error 8 retrieving Backup Objects
row for object 0.11583818 (SESSION: 2736) Oct 25, 2005 11:05:09 PM
ANR9999D ThreadId<52> issued message 9999 from: (SESSION: 2736) Oct 25,
2005 11:05:10 PM
ANR9999D smnode.c(7343): ThreadId<52> Session 2736: Invalid Group Id
0,11583818 for ADD function (SESSION: 2736) Oct 25, 2005 11:05:10 PM
ANR9999D ThreadId<52> issued message 9999 from: (SESSION: 2736) Oct 25,
2005 11:05:10 PM
ANR0403I Session 2736 ended for node ***** (WinNT). (SESSION: 2736)
Also, activity log shows that node has sent objects to the server but
server reports that 0 objects has been backed up (probably as a result
of above error).
We have never had failed node backup with this error as a reason. Also
at the same time, all other backups finished just fine..
While i was searching a adsm.org list for info on this error I stumbled
upon this discussion.
I wonder if anyone has discovered what this error actually means?!?
We are using Tivoli Storage Manager: 5.3.2 on Win 2003 server.
Thank you very much...
______________________________________
Branko Stanic
System Administrator
Ured registrara za ratne zlocine i organizovani kriminal
Sarajevo, Bosna i Hercegovina
+387 33 707 111
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Andrew Raibeck
Sent: 4. listopad 2005 15:28
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Fw: HELP!!!!
Joni,
Not to be patronizing... but take a deep breath, then another, then a
third, and relax. :-)
It is very difficult to diagnose any problems when all one has is a
vague (at best) problem description and a handful of various error
messages. In identifying the source of the trouble, context is nearly
(if not
completely) everything.
Without knowning anything else about this problem, I would recommend
that you start by reviewing your activity log. That would be the ENTIRE
log, not just certain messages. Start with the first of the ANR9999D
error messages being issued, and work your way backward, trying to get a
picture of what sessions, processes, and other events were running on
the server at the time the problem started. Also try searching the IBM
web site for instances of ANR9999D plus other keywords that appear in
the message text (don't search on numbers that might be
instance-specific, just search on non-numeric strings). If you search
only on ANR9999D, you'll get way too many hits.
If you can figure out what clients were running at the time this
occurred, check their error and schedule logs to see what errors they
received. What activities were they doing?
I see you have a script running called NAS_2-DIFFERENTIAL. That is
another event you can examine. Do this for all running sessions and
processes. You might have to go several hours back in the activity log
from the first of the ANR9999Ds, but this is a start.
I'm not sure where that list of messages below came from, but those are
shown in columns that are far too narrow. Consider querying the activity
log from an Admin CLI started with the -commadelimited option and
redirect the output to a file. You can then view the messages directly
from the file or load them into a spreadsheet or database for easier
reading.
Regards,
Andy
Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development Internal Notes e-mail: Andrew
Raibeck/Tucson/IBM@IBMUS Internet e-mail: storman AT us.ibm DOT com
IBM Tivoli Storage Manager support web page:
http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorag
eManager.html
The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2005-10-04
05:13:03:
> I found some other errors as well, do any of these look familiar?
> I've place a call with IBM, but I'm really at a loss here. For our
> NDMP
backups
> it stated an issue with malloc, but I don't know where this problem is
> beginning...
>
> Could I have an issue with the size of the bufferpool on this server?
Here
> are the options for this server and it has 4 processors and 8GB of
memory.
>
> Server Option Option Setting Server Option Option
Setting
>
> ----------------- -------------------- -----------------
> --------------------
> CommTimeOut 7,200 IdleTimeOut 360
>
> BufPoolSize 419432 LogPoolSize 1024
>
> MessageFormat 1 Language en_US
>
> Alias Halt HALT MaxSessions 500
>
> ExpInterval 0 ExpQuiet No
>
> EventServer Yes MirrorRead DB Normal
>
> MirrorRead LOG Normal MirrorWrite DB Sequential
>
> MirrorWrite LOG Parallel VolumeHistory
> /t01/volhist/hist01
> VolumeHistory /usr/tivoli/tsm/ser- Devconfig
> /t01/devconfig/dev01
> ver/bin/volhist
>
> Devconfig /usr/tivoli/tsm/ser- TxnGroupMax 256
>
> ver/bin/devconfig
>
> MoveBatchSize 1000 MoveSizeThresh 2048
>
> StatusMsgCnt 10 RestoreInterval 1,440
>
> UseLargeBuffers Yes DisableScheds No
>
> NOBUFPREfetch No AuditStorage Yes
>
> REQSYSauthoutfile Yes SELFTUNEBUFpools- Yes
>
> ize
>
> SELFTUNETXNsize Yes DBPAGEShadow No
>
> DBPAGESHADOWFile dbpgshdw.bdt MsgStackTrace On
>
> QueryAuth None LogWarnFullPerCe- 75
>
> nt
>
> ThroughPutDataTh- 0 ThroughPutTimeTh- 0
>
> reshold reshold
>
> NOPREEMPT ( No ) Resource Timeout 60
>
> TEC UTF8 Events No NORETRIEVEDATE No
>
> DNSLOOKUP Yes
>
> TCPPort 1500 TcpAdminport 1500
>
> HTTPPort 1580 TCPWindowsize 65536
>
> TCPBufsize 16384 TCPNoDelay Yes
>
> CommMethod TCPIP CommMethod ShMem
>
> CommMethod HTTP MsgInterval 1
>
> ShmPort 1510 FileExit
> /t01/log/eventserve-
> r(APPEND)
>
> UserExit FileTextExit
>
> AssistVCRRecovery Yes AcsAccessId chrs144
>
> AcsTimeoutX 1 AcsLockDrive No
>
> AcsQuickInit No SNMPSubagentPort 1521
>
> SNMPSubagentHost 127.0.0.1 SNMPHeartBeatInt 5
>
> TECHost TECPort 0
>
> UNIQUETECevents No UNIQUETDPTECeven- No
>
> ts
>
> Async I/O No Direct I/O Yes
>
> SHAREDLIBIDLE No 3494Shared No
>
>
>
>
> DATE_TIME MSGNO MESSAGE
> ------------------ ----------- ------------------
> 2005-10-03 9999 ANR9999D
> 22:02:43.000000 imgroup.c(1180):
> ThreadId<511>
> Error 8
> retrieving Backup
> Objects row for
> object
> 0.295703482
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003dea-
> d4 imIsGroupLead-
> er <- 0x00000001-
> 00385564
> SmNodeSession <-
> 0x000000010043bb-
> 38 HandleNodeSes-
> sion <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 40004)
> 2005-10-03 9999 ANR9999D
> 22:02:43.000000 smnode.c(7056):
> ThreadId<511>
> Session 40004:
> Invalid Group Id
> 0,295703482 for
> ADD function
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003855-
> 8c SmNodeSession
> <- 0x00000001004-
> 3bb38 HandleNode-
> Session <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 2005-10-03 8311 ANR8311E An I/O
> 22:37:12.000000 error occurred
> while accessing
> drive SL8500
> (/dev/rmt7) for
> LOCATE operation,
> errno = 78.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 1165 ANR1165E Error
> 22:37:13.000000 detected for file
> in storage pool
> TAPE_ORACLE: Node
> FJSU102, Type
> Backup, File
> space /p01, fsId
> 18, File name
> /app/cyb/esp/MED-
> -ESPSystemAgent/-
> spool/CM_DEMO/MA-
> IN/MEDAXBP.1366/
> VMSDBSAP.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 3523 ANR3523W GENERATE
> 22:37:13.000000 BACKUPSET:
> Retrieve failed
> - error on input
> storage device.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 3503 ANR3503E
> 22:37:13.000000 Generation of
> backup set for
> FJSU102 as
> FJSU102_BACKUPSE-
> T.295467854
> failed. (SESSION:
> 29020, PROCESS:
> 680)
> 2005-10-03 2032 ANR2032E GENERATE
> 22:37:14.000000 BACKUPSET:
> Command failed -
> internal server
> error detected.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 9999 ANR9999D
> 22:50:53.000000 imgroup.c(1180):
> ThreadId<437>
> Error 8
> retrieving Backup
> Objects row for
> object
> 0.295537065
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003dea-
> d4 imIsGroupLead-
> er <- 0x00000001-
> 00385564
> SmNodeSession <-
> 0x000000010043bb-
> 38 HandleNodeSes-
> sion <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 39214)
> 2005-10-03 9999 ANR9999D
> 22:50:53.000000 smnode.c(7056):
> ThreadId<437>
> Session 39214:
> Invalid Group Id
> 0,295537065 for
> ADD function
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003855-
> 8c SmNodeSession
> <- 0x00000001004-
> 3bb38 HandleNode-
> Session <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 39214)
> 2005-10-03 423 ANR0423W Session
> 22:51:42.000000 41306 for
> administrator (
> ) refused -
> administrator
> name not
> registered.
> (SESSION: 41306)
> 2005-10-03 8311 ANR8311E An I/O
> 22:52:14.000000 error occurred
> while accessing
> drive SL8500
> (/dev/rmt7) for
> OFFL operation,
> errno = 78.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 8769 ANR8769E External
> 23:34:45.000000 media management
> function DISMOUNT
> returned
> result=LIBRARY_E-
> RROR. (SESSION:
> 29020, PROCESS:
> 680)
> 2005-10-03 8469 ANR8469E Dismount
> 23:34:45.000000 of LTO volume
> T00897 from drive
> SL8500
> (/dev/rmt7) in
> library SL8500
> failed. (SESSION:
> 29020, PROCESS:
> 680)
> 2005-10-03 1410 ANR1410W Access
> 23:34:46.000000 mode for volume
> T00897 now set to
> "unavailable".
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 9999 ANR9999D
> 23:46:25.000000 ssremote.c(503):
> ThreadId<136>
> Unable to open
> remote session of
> type 1. Callchain
> of previous
> message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001004a41-
> 78 ssInitStoreRe-
> mote <-
> 0x000000010066ad-
> 10 AfInitStoreRe-
> mote <-
> 0x00000001006670-
> 24 bfInitStoreRe-
> mote <-
> 0x00000001006a05-
> c4 DoBackup <-
> 0x00000001006a3e-
> 5c AdmBackupNode
> <- 0x00000001001-
> 63168 AdmCommand-
> Local <-
> 0x00000001001642-
> ac admCommand <-
> 0x000000010015b1-
> 80 RunScript <-
> 0x000000010015cd-
> 30 DoRunScript <-
> 0x00000001001631-
> 68 AdmCommandLoc-
> al <- 0x00000001-
> 001642ac
> admCommand <-
> 0x000000010064ec-
> 58 SmExecSchedul-
> edCommand <-
> 0x000000010064ee-
> 54 smScheduledCo-
> nsoleSession <-
> 0x000000010064c8-
> 60 CsRunCmdThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 38192, PROCESS:
> 963)
> 2005-10-03 2032 ANR2032E BACKUP
> 23:46:25.000000 NODE: Command
> failed - internal
> server error
> detected.
> (SESSION: 38192,
> PROCESS: 963)
> 2005-10-03 1463 ANR1463E RUN:
> 23:46:25.000000 Command script
> NAS_2-DIFFERENTI-
> AL completed in
> error. (SESSION:
> 38192, PROCESS:
> 963)
> 2005-10-03 2752 ANR2752E Scheduled
> 23:46:25.000000 command
> NAS_2-DIFFERENTI-
> AL failed.
> (SESSION: 38192,
> PROCESS: 963)
>
> ********************************
> Joni Moyer
> Highmark
> Storage Systems
> Work:(717)302-6603
> Fax:(717)302-5974
> joni.moyer AT highmark DOT com
> ********************************
>
>
>
> "Richard Sims"
> <rbs AT bu DOT edu>
> To
> 10/04/2005 07:56 "Joni Moyer"
> AM <joni.moyer AT highmark DOT com>
> cc
>
> Subject
> Re: HELP!!!!
>
>
>
>
>
>
>
>
>
>
> Joni - That's an error we haven't seen before.
>
> Your best course of action is to call TSM Support.
>
> Richard Sims
>
> On Oct 4, 2005, at 7:32 AM, Joni Moyer wrote:
>
> > Has anyone ever seen this message before? I have TSM 5.2.4 running
> > on AIX
> > 5.2 and it seems like this error message occurred and then all
> > processing stopped and it almost looks like TSM stopped & restarted
> > itself. Any suggestions are appreciated!!!! I'm completely lost in
> > this situation.
> > Thank you in advance!
> >
> > 10/03/05 23:46:25 ANR9999D ssremote.c(503): ThreadId<136>
> > Unable to
> > open
> > remote session of type 1. Callchain of
> > previous
> > message:
> > 0x0000000100017d94 outDiagf <-
> > 0x00000001004a4178
> >
> > ssInitStoreRemote <- 0x000000010066ad10
> > AfInitStoreRemote
> > <- 0x0000000100667024 bfInitStoreRemote <-
> > 0x00000001006-
> > a05c4 DoBackup <- 0x00000001006a3e5c
> > AdmBackupNode
> > <-
> > 0x0000000100163168 AdmCommandLocal <-
> > 0x00000001001642ac
> > admCommand <- 0x000000010015b180 RunScript <-
> > 0x00000001-
> > 0015cd30 DoRunScript <- 0x0000000100163168
> > AdmCommandLoc-
> > al <- 0x00000001001642ac admCommand <-
> > 0x000000010064ec58
> > SmExecScheduledCommand <- 0x000000010064ee54
> > smScheduled-
> > ConsoleSession <- 0x000000010064c860
> > CsRunCmdThread
> > <-
> > 0x0000000100008078 StartThread <-
> > 0x09000000002f4460
> >
> > _pthread_body <- (SESSION: 38192, PROCESS:
> > 963)
> > 10/03/05 23:46:25 ANR2032E BACKUP NODE: Command failed -
internal
> > server
> > error detected. (SESSION: 38192, PROCESS:
> > 963)
> >
> > 10/03/05 23:46:25 ANR2753I (NAS_2-DIFFERENTIAL):ANR2032E BACKUP
> > NODE:
> >
> > Command failed - (SESSION: 38192)
> >
> > 10/03/05 23:46:25 ANR2753I (NAS_2-DIFFERENTIAL):internal server
> > error
> >
> > detected. (SESSION: 38192)
> >
> > 10/03/05 23:46:25 ANR1463E RUN: Command script
NAS_2-DIFFERENTIAL
> > completed
> > in error. (SESSION: 38192, PROCESS: 963)
> >
> >
> >
> > ********************************
> > Joni Moyer
> > Highmark
> > Storage Systems
> > Work:(717)302-6603
> > Fax:(717)302-5974
> > joni.moyer AT highmark DOT com
> > ********************************
> >
>
>
|