ADSM-L

Re: Fw: HELP!!!!

2005-10-04 09:27:18
Subject: Re: Fw: HELP!!!!
From: Andrew Raibeck <storman AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 4 Oct 2005 07:27:40 -0600
Joni,

Not to be patronizing... but take a deep breath, then another, then a 
third, and relax. :-)

It is very difficult to diagnose any problems when all one has is a vague 
(at best) problem description and a handful of various error messages. In 
identifying the source of the trouble, context is nearly (if not 
completely) everything.

Without knowning anything else about this problem, I would recommend that 
you start by reviewing your activity log. That would be the ENTIRE log, 
not just certain messages. Start with the first of the ANR9999D error 
messages being issued, and work your way backward, trying to get a picture 
of what sessions, processes, and other events were running on the server 
at the time the problem started. Also try searching the IBM web site for 
instances of ANR9999D plus other keywords that appear in the message text 
(don't search on numbers that might be instance-specific, just search on 
non-numeric strings). If you search only on ANR9999D, you'll get way too 
many hits.

If you can figure out what clients were running at the time this occurred, 
check their error and schedule logs to see what errors they received. What 
activities were they doing?

I see you have a script running called NAS_2-DIFFERENTIAL. That is another 
event you can examine. Do this for all running sessions and processes. You 
might have to go several hours back in the activity log from the first of 
the ANR9999Ds, but this is a start.

I'm not sure where that list of messages below came from, but those are 
shown in columns that are far too narrow. Consider querying the activity 
log from an Admin CLI started with the -commadelimited option and redirect 
the output to a file. You can then view the messages directly from the 
file or load them into a spreadsheet or database for easier reading.

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.ibm DOT com

IBM Tivoli Storage Manager support web page: 
http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorageManager.html

The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2005-10-04 
05:13:03:

> I found some other errors as well, do any of these look familiar?  I've
> place a call with IBM, but I'm really at a loss here.  For our NDMP 
backups
> it stated an issue with malloc, but I don't know where this problem is
> beginning...
> 
> Could I have an issue with the size of the bufferpool on this server? 
Here
> are the options for this server and it has 4 processors and 8GB of 
memory.
> 
> Server Option      Option Setting        Server Option      Option 
Setting
> 
> -----------------  --------------------  -----------------
> --------------------
> CommTimeOut        7,200                 IdleTimeOut        360
> 
> BufPoolSize        419432                LogPoolSize        1024
> 
> MessageFormat      1                     Language           en_US
> 
> Alias Halt         HALT                  MaxSessions        500
> 
> ExpInterval        0                     ExpQuiet           No
> 
> EventServer        Yes                   MirrorRead DB      Normal
> 
> MirrorRead LOG     Normal                MirrorWrite DB     Sequential
> 
> MirrorWrite LOG    Parallel              VolumeHistory
> /t01/volhist/hist01
> VolumeHistory      /usr/tivoli/tsm/ser-  Devconfig
> /t01/devconfig/dev01
>                     ver/bin/volhist
> 
> Devconfig          /usr/tivoli/tsm/ser-  TxnGroupMax        256
> 
>                     ver/bin/devconfig
> 
> MoveBatchSize      1000                  MoveSizeThresh     2048
> 
> StatusMsgCnt       10                    RestoreInterval    1,440
> 
> UseLargeBuffers    Yes                   DisableScheds      No
> 
> NOBUFPREfetch      No                    AuditStorage       Yes
> 
> REQSYSauthoutfile  Yes                   SELFTUNEBUFpools-  Yes
> 
>                                           ize
> 
> SELFTUNETXNsize    Yes                   DBPAGEShadow       No
> 
> DBPAGESHADOWFile   dbpgshdw.bdt          MsgStackTrace      On
> 
> QueryAuth          None                  LogWarnFullPerCe-  75
> 
>                                           nt
> 
> ThroughPutDataTh-  0                     ThroughPutTimeTh-  0
> 
>  reshold                                  reshold
> 
> NOPREEMPT          ( No )                Resource Timeout   60
> 
> TEC UTF8 Events    No                    NORETRIEVEDATE     No
> 
> DNSLOOKUP          Yes
> 
> TCPPort            1500                  TcpAdminport       1500
> 
> HTTPPort           1580                  TCPWindowsize      65536
> 
> TCPBufsize         16384                 TCPNoDelay         Yes
> 
> CommMethod         TCPIP                 CommMethod         ShMem
> 
> CommMethod         HTTP                  MsgInterval        1
> 
> ShmPort            1510                  FileExit
> /t01/log/eventserve-
>                                                              r(APPEND)
> 
> UserExit                                 FileTextExit
> 
> AssistVCRRecovery  Yes                   AcsAccessId        chrs144
> 
> AcsTimeoutX        1                     AcsLockDrive       No
> 
> AcsQuickInit       No                    SNMPSubagentPort   1521
> 
> SNMPSubagentHost   127.0.0.1             SNMPHeartBeatInt   5
> 
> TECHost                                  TECPort            0
> 
> UNIQUETECevents    No                    UNIQUETDPTECeven-  No
> 
>                                           ts
> 
> Async I/O          No                    Direct I/O         Yes
> 
> SHAREDLIBIDLE      No                    3494Shared         No
> 
> 
> 
> 
>          DATE_TIME           MSGNO     MESSAGE
> ------------------     -----------     ------------------
>         2005-10-03            9999     ANR9999D
>    22:02:43.000000                      imgroup.c(1180):
>                                         ThreadId<511>
>                                         Error 8
>                                         retrieving Backup
>                                         Objects row for
>                                         object
>                                         0.295703482
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003dea-
>                                         d4 imIsGroupLead-
>                                         er <- 0x00000001-
>                                         00385564
>                                         SmNodeSession <-
>                                         0x000000010043bb-
>                                         38 HandleNodeSes-
>                                         sion <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         40004)
>         2005-10-03            9999     ANR9999D
>    22:02:43.000000                      smnode.c(7056):
>                                         ThreadId<511>
>                                         Session 40004:
>                                         Invalid Group Id
>                                         0,295703482 for
>                                         ADD function
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003855-
>                                         8c SmNodeSession
>                                         <- 0x00000001004-
>                                         3bb38 HandleNode-
>                                         Session <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>         2005-10-03            8311     ANR8311E An I/O
>    22:37:12.000000                      error occurred
>                                         while accessing
>                                         drive SL8500
>                                         (/dev/rmt7) for
>                                         LOCATE operation,
>                                         errno = 78.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            1165     ANR1165E Error
>    22:37:13.000000                      detected for file
>                                         in storage pool
>                                         TAPE_ORACLE: Node
>                                         FJSU102, Type
>                                         Backup, File
>                                         space /p01, fsId
>                                         18, File name
>                                         /app/cyb/esp/MED-
>                                         -ESPSystemAgent/-
>                                         spool/CM_DEMO/MA-
>                                         IN/MEDAXBP.1366/
>                                         VMSDBSAP.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            3523     ANR3523W GENERATE
>    22:37:13.000000                      BACKUPSET:
>                                         Retrieve failed
>                                         - error on input
>                                         storage device.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            3503     ANR3503E
>    22:37:13.000000                      Generation of
>                                         backup set for
>                                         FJSU102 as
>                                         FJSU102_BACKUPSE-
>                                         T.295467854
>                                         failed. (SESSION:
>                                         29020, PROCESS:
>                                         680)
>         2005-10-03            2032     ANR2032E GENERATE
>    22:37:14.000000                      BACKUPSET:
>                                         Command failed -
>                                         internal server
>                                         error detected.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            9999     ANR9999D
>    22:50:53.000000                      imgroup.c(1180):
>                                         ThreadId<437>
>                                         Error 8
>                                         retrieving Backup
>                                         Objects row for
>                                         object
>                                         0.295537065
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003dea-
>                                         d4 imIsGroupLead-
>                                         er <- 0x00000001-
>                                         00385564
>                                         SmNodeSession <-
>                                         0x000000010043bb-
>                                         38 HandleNodeSes-
>                                         sion <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         39214)
>         2005-10-03            9999     ANR9999D
>    22:50:53.000000                      smnode.c(7056):
>                                         ThreadId<437>
>                                         Session 39214:
>                                         Invalid Group Id
>                                         0,295537065 for
>                                         ADD function
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003855-
>                                         8c SmNodeSession
>                                         <- 0x00000001004-
>                                         3bb38 HandleNode-
>                                         Session <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         39214)
>         2005-10-03             423     ANR0423W Session
>    22:51:42.000000                      41306 for
>                                         administrator  (
>                                         ) refused -
>                                         administrator
>                                         name not
>                                         registered.
>                                         (SESSION: 41306)
>         2005-10-03            8311     ANR8311E An I/O
>    22:52:14.000000                      error occurred
>                                         while accessing
>                                         drive SL8500
>                                         (/dev/rmt7) for
>                                         OFFL operation,
>                                         errno = 78.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            8769     ANR8769E External
>    23:34:45.000000                      media management
>                                         function DISMOUNT
>                                         returned
>                                         result=LIBRARY_E-
>                                         RROR. (SESSION:
>                                         29020, PROCESS:
>                                         680)
>         2005-10-03            8469     ANR8469E Dismount
>    23:34:45.000000                      of LTO volume
>                                         T00897 from drive
>                                         SL8500
>                                         (/dev/rmt7) in
>                                         library SL8500
>                                         failed. (SESSION:
>                                         29020, PROCESS:
>                                         680)
>         2005-10-03            1410     ANR1410W Access
>    23:34:46.000000                      mode for volume
>                                         T00897 now set to
>                                         "unavailable".
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            9999     ANR9999D
>    23:46:25.000000                      ssremote.c(503):
>                                         ThreadId<136>
>                                         Unable to open
>                                         remote session of
>                                         type 1. Callchain
>                                         of previous
>                                         message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001004a41-
>                                         78 ssInitStoreRe-
>                                         mote <-
>                                         0x000000010066ad-
>                                         10 AfInitStoreRe-
>                                         mote <-
>                                         0x00000001006670-
>                                         24 bfInitStoreRe-
>                                         mote <-
>                                         0x00000001006a05-
>                                         c4 DoBackup <-
>                                         0x00000001006a3e-
>                                         5c AdmBackupNode
>                                         <- 0x00000001001-
>                                         63168 AdmCommand-
>                                         Local <-
>                                         0x00000001001642-
>                                         ac admCommand <-
>                                         0x000000010015b1-
>                                         80 RunScript <-
>                                         0x000000010015cd-
>                                         30 DoRunScript <-
>                                         0x00000001001631-
>                                         68 AdmCommandLoc-
>                                         al <- 0x00000001-
>                                         001642ac
>                                         admCommand <-
>                                         0x000000010064ec-
>                                         58 SmExecSchedul-
>                                         edCommand <-
>                                         0x000000010064ee-
>                                         54 smScheduledCo-
>                                         nsoleSession <-
>                                         0x000000010064c8-
>                                         60 CsRunCmdThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         38192, PROCESS:
>                                         963)
>         2005-10-03            2032     ANR2032E BACKUP
>    23:46:25.000000                      NODE: Command
>                                         failed - internal
>                                         server error
>                                         detected.
>                                         (SESSION: 38192,
>                                         PROCESS: 963)
>         2005-10-03            1463     ANR1463E RUN:
>    23:46:25.000000                      Command script
>                                         NAS_2-DIFFERENTI-
>                                         AL completed in
>                                         error. (SESSION:
>                                         38192, PROCESS:
>                                         963)
>         2005-10-03            2752     ANR2752E Scheduled
>    23:46:25.000000                      command
>                                         NAS_2-DIFFERENTI-
>                                         AL failed.
>                                         (SESSION: 38192,
>                                         PROCESS: 963)
> 
> ********************************
> Joni Moyer
> Highmark
> Storage Systems
> Work:(717)302-6603
> Fax:(717)302-5974
> joni.moyer AT highmark DOT com
> ********************************
> 
> 
> 
>              "Richard Sims" 
>              <rbs AT bu DOT edu> 
> To
>              10/04/2005 07:56          "Joni Moyer" 
>              AM                        <joni.moyer AT highmark DOT com> 
> cc
> 
> Subject
>                                        Re: HELP!!!! 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Joni - That's an error we haven't seen before.
> 
> Your best course of action is to call TSM Support.
> 
>    Richard Sims
> 
> On Oct 4, 2005, at 7:32 AM, Joni Moyer wrote:
> 
> > Has anyone ever seen this message before?  I have TSM 5.2.4 running
> > on AIX
> > 5.2 and it seems like this error message occurred and then all
> > processing
> > stopped and it almost looks like TSM stopped & restarted itself.  Any
> > suggestions are appreciated!!!!  I'm completely lost in this
> > situation.
> > Thank you in advance!
> >
> > 10/03/05 23:46:25     ANR9999D ssremote.c(503): ThreadId<136>
> > Unable to
> > open
> >                        remote session of type 1. Callchain of previous
> > message:
> >                        0x0000000100017d94 outDiagf <-
> > 0x00000001004a4178
> >
> >                        ssInitStoreRemote <- 0x000000010066ad10
> > AfInitStoreRemote
> >                        <- 0x0000000100667024 bfInitStoreRemote <-
> > 0x00000001006-
> >                        a05c4 DoBackup <- 0x00000001006a3e5c
> > AdmBackupNode
> > <-
> >                        0x0000000100163168 AdmCommandLocal <-
> > 0x00000001001642ac
> >                        admCommand <- 0x000000010015b180 RunScript <-
> > 0x00000001-
> >                        0015cd30 DoRunScript <- 0x0000000100163168
> > AdmCommandLoc-
> >                        al <- 0x00000001001642ac admCommand <-
> > 0x000000010064ec58
> >                        SmExecScheduledCommand <- 0x000000010064ee54
> > smScheduled-
> >                        ConsoleSession <- 0x000000010064c860
> > CsRunCmdThread
> > <-
> >                        0x0000000100008078 StartThread <-
> > 0x09000000002f4460
> >
> >                        _pthread_body <-  (SESSION: 38192, PROCESS:
> > 963)
> > 10/03/05 23:46:25     ANR2032E BACKUP NODE: Command failed - internal
> > server
> >                        error detected. (SESSION: 38192, PROCESS: 963)
> >
> > 10/03/05 23:46:25     ANR2753I (NAS_2-DIFFERENTIAL):ANR2032E BACKUP
> > NODE:
> >
> >                        Command failed - (SESSION: 38192)
> >
> > 10/03/05 23:46:25     ANR2753I (NAS_2-DIFFERENTIAL):internal server
> > error
> >
> >                        detected.  (SESSION: 38192)
> >
> > 10/03/05 23:46:25     ANR1463E RUN: Command script NAS_2-DIFFERENTIAL
> > completed
> >                        in error. (SESSION: 38192, PROCESS: 963)
> >
> >
> >
> > ********************************
> > Joni Moyer
> > Highmark
> > Storage Systems
> > Work:(717)302-6603
> > Fax:(717)302-5974
> > joni.moyer AT highmark DOT com
> > ********************************
> >
> 
> 

<Prev in Thread] Current Thread [Next in Thread>