Joni,
Not to be patronizing... but take a deep breath, then another, then a
third, and relax. :-)
It is very difficult to diagnose any problems when all one has is a vague
(at best) problem description and a handful of various error messages. In
identifying the source of the trouble, context is nearly (if not
completely) everything.
Without knowning anything else about this problem, I would recommend that
you start by reviewing your activity log. That would be the ENTIRE log,
not just certain messages. Start with the first of the ANR9999D error
messages being issued, and work your way backward, trying to get a picture
of what sessions, processes, and other events were running on the server
at the time the problem started. Also try searching the IBM web site for
instances of ANR9999D plus other keywords that appear in the message text
(don't search on numbers that might be instance-specific, just search on
non-numeric strings). If you search only on ANR9999D, you'll get way too
many hits.
If you can figure out what clients were running at the time this occurred,
check their error and schedule logs to see what errors they received. What
activities were they doing?
I see you have a script running called NAS_2-DIFFERENTIAL. That is another
event you can examine. Do this for all running sessions and processes. You
might have to go several hours back in the activity log from the first of
the ANR9999Ds, but this is a start.
I'm not sure where that list of messages below came from, but those are
shown in columns that are far too narrow. Consider querying the activity
log from an Admin CLI started with the -commadelimited option and redirect
the output to a file. You can then view the messages directly from the
file or load them into a spreadsheet or database for easier reading.
Regards,
Andy
Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.ibm DOT com
IBM Tivoli Storage Manager support web page:
http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorageManager.html
The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2005-10-04
05:13:03:
> I found some other errors as well, do any of these look familiar? I've
> place a call with IBM, but I'm really at a loss here. For our NDMP
backups
> it stated an issue with malloc, but I don't know where this problem is
> beginning...
>
> Could I have an issue with the size of the bufferpool on this server?
Here
> are the options for this server and it has 4 processors and 8GB of
memory.
>
> Server Option Option Setting Server Option Option
Setting
>
> ----------------- -------------------- -----------------
> --------------------
> CommTimeOut 7,200 IdleTimeOut 360
>
> BufPoolSize 419432 LogPoolSize 1024
>
> MessageFormat 1 Language en_US
>
> Alias Halt HALT MaxSessions 500
>
> ExpInterval 0 ExpQuiet No
>
> EventServer Yes MirrorRead DB Normal
>
> MirrorRead LOG Normal MirrorWrite DB Sequential
>
> MirrorWrite LOG Parallel VolumeHistory
> /t01/volhist/hist01
> VolumeHistory /usr/tivoli/tsm/ser- Devconfig
> /t01/devconfig/dev01
> ver/bin/volhist
>
> Devconfig /usr/tivoli/tsm/ser- TxnGroupMax 256
>
> ver/bin/devconfig
>
> MoveBatchSize 1000 MoveSizeThresh 2048
>
> StatusMsgCnt 10 RestoreInterval 1,440
>
> UseLargeBuffers Yes DisableScheds No
>
> NOBUFPREfetch No AuditStorage Yes
>
> REQSYSauthoutfile Yes SELFTUNEBUFpools- Yes
>
> ize
>
> SELFTUNETXNsize Yes DBPAGEShadow No
>
> DBPAGESHADOWFile dbpgshdw.bdt MsgStackTrace On
>
> QueryAuth None LogWarnFullPerCe- 75
>
> nt
>
> ThroughPutDataTh- 0 ThroughPutTimeTh- 0
>
> reshold reshold
>
> NOPREEMPT ( No ) Resource Timeout 60
>
> TEC UTF8 Events No NORETRIEVEDATE No
>
> DNSLOOKUP Yes
>
> TCPPort 1500 TcpAdminport 1500
>
> HTTPPort 1580 TCPWindowsize 65536
>
> TCPBufsize 16384 TCPNoDelay Yes
>
> CommMethod TCPIP CommMethod ShMem
>
> CommMethod HTTP MsgInterval 1
>
> ShmPort 1510 FileExit
> /t01/log/eventserve-
> r(APPEND)
>
> UserExit FileTextExit
>
> AssistVCRRecovery Yes AcsAccessId chrs144
>
> AcsTimeoutX 1 AcsLockDrive No
>
> AcsQuickInit No SNMPSubagentPort 1521
>
> SNMPSubagentHost 127.0.0.1 SNMPHeartBeatInt 5
>
> TECHost TECPort 0
>
> UNIQUETECevents No UNIQUETDPTECeven- No
>
> ts
>
> Async I/O No Direct I/O Yes
>
> SHAREDLIBIDLE No 3494Shared No
>
>
>
>
> DATE_TIME MSGNO MESSAGE
> ------------------ ----------- ------------------
> 2005-10-03 9999 ANR9999D
> 22:02:43.000000 imgroup.c(1180):
> ThreadId<511>
> Error 8
> retrieving Backup
> Objects row for
> object
> 0.295703482
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003dea-
> d4 imIsGroupLead-
> er <- 0x00000001-
> 00385564
> SmNodeSession <-
> 0x000000010043bb-
> 38 HandleNodeSes-
> sion <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 40004)
> 2005-10-03 9999 ANR9999D
> 22:02:43.000000 smnode.c(7056):
> ThreadId<511>
> Session 40004:
> Invalid Group Id
> 0,295703482 for
> ADD function
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003855-
> 8c SmNodeSession
> <- 0x00000001004-
> 3bb38 HandleNode-
> Session <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 2005-10-03 8311 ANR8311E An I/O
> 22:37:12.000000 error occurred
> while accessing
> drive SL8500
> (/dev/rmt7) for
> LOCATE operation,
> errno = 78.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 1165 ANR1165E Error
> 22:37:13.000000 detected for file
> in storage pool
> TAPE_ORACLE: Node
> FJSU102, Type
> Backup, File
> space /p01, fsId
> 18, File name
> /app/cyb/esp/MED-
> -ESPSystemAgent/-
> spool/CM_DEMO/MA-
> IN/MEDAXBP.1366/
> VMSDBSAP.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 3523 ANR3523W GENERATE
> 22:37:13.000000 BACKUPSET:
> Retrieve failed
> - error on input
> storage device.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 3503 ANR3503E
> 22:37:13.000000 Generation of
> backup set for
> FJSU102 as
> FJSU102_BACKUPSE-
> T.295467854
> failed. (SESSION:
> 29020, PROCESS:
> 680)
> 2005-10-03 2032 ANR2032E GENERATE
> 22:37:14.000000 BACKUPSET:
> Command failed -
> internal server
> error detected.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 9999 ANR9999D
> 22:50:53.000000 imgroup.c(1180):
> ThreadId<437>
> Error 8
> retrieving Backup
> Objects row for
> object
> 0.295537065
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003dea-
> d4 imIsGroupLead-
> er <- 0x00000001-
> 00385564
> SmNodeSession <-
> 0x000000010043bb-
> 38 HandleNodeSes-
> sion <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 39214)
> 2005-10-03 9999 ANR9999D
> 22:50:53.000000 smnode.c(7056):
> ThreadId<437>
> Session 39214:
> Invalid Group Id
> 0,295537065 for
> ADD function
> Callchain of
> previous message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001003855-
> 8c SmNodeSession
> <- 0x00000001004-
> 3bb38 HandleNode-
> Session <-
> 0x00000001004419-
> 64 smExecuteSess-
> ion <-
> 0x00000001004344-
> 78 SessionThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 39214)
> 2005-10-03 423 ANR0423W Session
> 22:51:42.000000 41306 for
> administrator (
> ) refused -
> administrator
> name not
> registered.
> (SESSION: 41306)
> 2005-10-03 8311 ANR8311E An I/O
> 22:52:14.000000 error occurred
> while accessing
> drive SL8500
> (/dev/rmt7) for
> OFFL operation,
> errno = 78.
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 8769 ANR8769E External
> 23:34:45.000000 media management
> function DISMOUNT
> returned
> result=LIBRARY_E-
> RROR. (SESSION:
> 29020, PROCESS:
> 680)
> 2005-10-03 8469 ANR8469E Dismount
> 23:34:45.000000 of LTO volume
> T00897 from drive
> SL8500
> (/dev/rmt7) in
> library SL8500
> failed. (SESSION:
> 29020, PROCESS:
> 680)
> 2005-10-03 1410 ANR1410W Access
> 23:34:46.000000 mode for volume
> T00897 now set to
> "unavailable".
> (SESSION: 29020,
> PROCESS: 680)
> 2005-10-03 9999 ANR9999D
> 23:46:25.000000 ssremote.c(503):
> ThreadId<136>
> Unable to open
> remote session of
> type 1. Callchain
> of previous
> message:
> 0x0000000100017d-
> 94 outDiagf <-
> 0x00000001004a41-
> 78 ssInitStoreRe-
> mote <-
> 0x000000010066ad-
> 10 AfInitStoreRe-
> mote <-
> 0x00000001006670-
> 24 bfInitStoreRe-
> mote <-
> 0x00000001006a05-
> c4 DoBackup <-
> 0x00000001006a3e-
> 5c AdmBackupNode
> <- 0x00000001001-
> 63168 AdmCommand-
> Local <-
> 0x00000001001642-
> ac admCommand <-
> 0x000000010015b1-
> 80 RunScript <-
> 0x000000010015cd-
> 30 DoRunScript <-
> 0x00000001001631-
> 68 AdmCommandLoc-
> al <- 0x00000001-
> 001642ac
> admCommand <-
> 0x000000010064ec-
> 58 SmExecSchedul-
> edCommand <-
> 0x000000010064ee-
> 54 smScheduledCo-
> nsoleSession <-
> 0x000000010064c8-
> 60 CsRunCmdThread
> <- 0x00000001000-
> 08078 StartThread
> <- 0x09000000002-
> f4460 _pthread_b-
> ody <- (SESSION:
> 38192, PROCESS:
> 963)
> 2005-10-03 2032 ANR2032E BACKUP
> 23:46:25.000000 NODE: Command
> failed - internal
> server error
> detected.
> (SESSION: 38192,
> PROCESS: 963)
> 2005-10-03 1463 ANR1463E RUN:
> 23:46:25.000000 Command script
> NAS_2-DIFFERENTI-
> AL completed in
> error. (SESSION:
> 38192, PROCESS:
> 963)
> 2005-10-03 2752 ANR2752E Scheduled
> 23:46:25.000000 command
> NAS_2-DIFFERENTI-
> AL failed.
> (SESSION: 38192,
> PROCESS: 963)
>
> ********************************
> Joni Moyer
> Highmark
> Storage Systems
> Work:(717)302-6603
> Fax:(717)302-5974
> joni.moyer AT highmark DOT com
> ********************************
>
>
>
> "Richard Sims"
> <rbs AT bu DOT edu>
> To
> 10/04/2005 07:56 "Joni Moyer"
> AM <joni.moyer AT highmark DOT com>
> cc
>
> Subject
> Re: HELP!!!!
>
>
>
>
>
>
>
>
>
>
> Joni - That's an error we haven't seen before.
>
> Your best course of action is to call TSM Support.
>
> Richard Sims
>
> On Oct 4, 2005, at 7:32 AM, Joni Moyer wrote:
>
> > Has anyone ever seen this message before? I have TSM 5.2.4 running
> > on AIX
> > 5.2 and it seems like this error message occurred and then all
> > processing
> > stopped and it almost looks like TSM stopped & restarted itself. Any
> > suggestions are appreciated!!!! I'm completely lost in this
> > situation.
> > Thank you in advance!
> >
> > 10/03/05 23:46:25 ANR9999D ssremote.c(503): ThreadId<136>
> > Unable to
> > open
> > remote session of type 1. Callchain of previous
> > message:
> > 0x0000000100017d94 outDiagf <-
> > 0x00000001004a4178
> >
> > ssInitStoreRemote <- 0x000000010066ad10
> > AfInitStoreRemote
> > <- 0x0000000100667024 bfInitStoreRemote <-
> > 0x00000001006-
> > a05c4 DoBackup <- 0x00000001006a3e5c
> > AdmBackupNode
> > <-
> > 0x0000000100163168 AdmCommandLocal <-
> > 0x00000001001642ac
> > admCommand <- 0x000000010015b180 RunScript <-
> > 0x00000001-
> > 0015cd30 DoRunScript <- 0x0000000100163168
> > AdmCommandLoc-
> > al <- 0x00000001001642ac admCommand <-
> > 0x000000010064ec58
> > SmExecScheduledCommand <- 0x000000010064ee54
> > smScheduled-
> > ConsoleSession <- 0x000000010064c860
> > CsRunCmdThread
> > <-
> > 0x0000000100008078 StartThread <-
> > 0x09000000002f4460
> >
> > _pthread_body <- (SESSION: 38192, PROCESS:
> > 963)
> > 10/03/05 23:46:25 ANR2032E BACKUP NODE: Command failed - internal
> > server
> > error detected. (SESSION: 38192, PROCESS: 963)
> >
> > 10/03/05 23:46:25 ANR2753I (NAS_2-DIFFERENTIAL):ANR2032E BACKUP
> > NODE:
> >
> > Command failed - (SESSION: 38192)
> >
> > 10/03/05 23:46:25 ANR2753I (NAS_2-DIFFERENTIAL):internal server
> > error
> >
> > detected. (SESSION: 38192)
> >
> > 10/03/05 23:46:25 ANR1463E RUN: Command script NAS_2-DIFFERENTIAL
> > completed
> > in error. (SESSION: 38192, PROCESS: 963)
> >
> >
> >
> > ********************************
> > Joni Moyer
> > Highmark
> > Storage Systems
> > Work:(717)302-6603
> > Fax:(717)302-5974
> > joni.moyer AT highmark DOT com
> > ********************************
> >
>
>
|