ADSM-L

Re: Fw: HELP!!!!

2005-10-26 05:26:37
Subject: Re: Fw: HELP!!!!
From: Branko Stanic <Branko.Stanic AT REGISTRARBIH.GOV DOT BA>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 26 Oct 2005 11:11:12 +0200
Hi all,

This morning i noticed that our TSM failed to backup one of our nodes. A
look at the activity log revealed following errors:

ANR9999D imgroup.c(1180): ThreadId<52> Error 8 retrieving Backup Objects
row for object 0.11583818 (SESSION: 2736) Oct 25, 2005 11:05:09 PM 
ANR9999D ThreadId<52> issued message 9999 from: (SESSION: 2736) Oct 25,
2005 11:05:10 PM 
ANR9999D smnode.c(7343): ThreadId<52> Session 2736: Invalid Group Id
0,11583818 for ADD function (SESSION: 2736) Oct 25, 2005 11:05:10 PM 
ANR9999D ThreadId<52> issued message 9999 from: (SESSION: 2736) Oct 25,
2005 11:05:10 PM 
ANR0403I Session 2736 ended for node ***** (WinNT). (SESSION: 2736)

Also, activity log shows that node has sent objects to the server but
server reports that 0 objects has been backed up (probably as a result
of above error).

We have never had failed node backup with this error as a reason. Also
at the same time, all other backups finished just fine..  

While i was searching a adsm.org list for info on this error I stumbled
upon this discussion.

I wonder if anyone has discovered what this error actually means?!?

We are using Tivoli Storage Manager: 5.3.2 on Win 2003 server.

Thank you very much...


______________________________________
Branko Stanic
System Administrator
Ured registrara za ratne zlocine i organizovani kriminal
Sarajevo, Bosna i Hercegovina
+387 33 707 111
 
 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Andrew Raibeck
Sent: 4. listopad 2005 15:28
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Fw: HELP!!!!

Joni,

Not to be patronizing... but take a deep breath, then another, then a
third, and relax. :-)

It is very difficult to diagnose any problems when all one has is a
vague (at best) problem description and a handful of various error
messages. In identifying the source of the trouble, context is nearly
(if not
completely) everything.

Without knowning anything else about this problem, I would recommend
that you start by reviewing your activity log. That would be the ENTIRE
log, not just certain messages. Start with the first of the ANR9999D
error messages being issued, and work your way backward, trying to get a
picture of what sessions, processes, and other events were running on
the server at the time the problem started. Also try searching the IBM
web site for instances of ANR9999D plus other keywords that appear in
the message text (don't search on numbers that might be
instance-specific, just search on non-numeric strings). If you search
only on ANR9999D, you'll get way too many hits.

If you can figure out what clients were running at the time this
occurred, check their error and schedule logs to see what errors they
received. What activities were they doing?

I see you have a script running called NAS_2-DIFFERENTIAL. That is
another event you can examine. Do this for all running sessions and
processes. You might have to go several hours back in the activity log
from the first of the ANR9999Ds, but this is a start.

I'm not sure where that list of messages below came from, but those are
shown in columns that are far too narrow. Consider querying the activity
log from an Admin CLI started with the -commadelimited option and
redirect the output to a file. You can then view the messages directly
from the file or load them into a spreadsheet or database for easier
reading.

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development Internal Notes e-mail: Andrew
Raibeck/Tucson/IBM@IBMUS Internet e-mail: storman AT us.ibm DOT com

IBM Tivoli Storage Manager support web page: 
http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorag
eManager.html

The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2005-10-04
05:13:03:

> I found some other errors as well, do any of these look familiar?  
> I've place a call with IBM, but I'm really at a loss here.  For our 
> NDMP
backups
> it stated an issue with malloc, but I don't know where this problem is

> beginning...
> 
> Could I have an issue with the size of the bufferpool on this server? 
Here
> are the options for this server and it has 4 processors and 8GB of
memory.
> 
> Server Option      Option Setting        Server Option      Option 
Setting
> 
> -----------------  --------------------  -----------------
> --------------------
> CommTimeOut        7,200                 IdleTimeOut        360
> 
> BufPoolSize        419432                LogPoolSize        1024
> 
> MessageFormat      1                     Language           en_US
> 
> Alias Halt         HALT                  MaxSessions        500
> 
> ExpInterval        0                     ExpQuiet           No
> 
> EventServer        Yes                   MirrorRead DB      Normal
> 
> MirrorRead LOG     Normal                MirrorWrite DB     Sequential
> 
> MirrorWrite LOG    Parallel              VolumeHistory
> /t01/volhist/hist01
> VolumeHistory      /usr/tivoli/tsm/ser-  Devconfig
> /t01/devconfig/dev01
>                     ver/bin/volhist
> 
> Devconfig          /usr/tivoli/tsm/ser-  TxnGroupMax        256
> 
>                     ver/bin/devconfig
> 
> MoveBatchSize      1000                  MoveSizeThresh     2048
> 
> StatusMsgCnt       10                    RestoreInterval    1,440
> 
> UseLargeBuffers    Yes                   DisableScheds      No
> 
> NOBUFPREfetch      No                    AuditStorage       Yes
> 
> REQSYSauthoutfile  Yes                   SELFTUNEBUFpools-  Yes
> 
>                                           ize
> 
> SELFTUNETXNsize    Yes                   DBPAGEShadow       No
> 
> DBPAGESHADOWFile   dbpgshdw.bdt          MsgStackTrace      On
> 
> QueryAuth          None                  LogWarnFullPerCe-  75
> 
>                                           nt
> 
> ThroughPutDataTh-  0                     ThroughPutTimeTh-  0
> 
>  reshold                                  reshold
> 
> NOPREEMPT          ( No )                Resource Timeout   60
> 
> TEC UTF8 Events    No                    NORETRIEVEDATE     No
> 
> DNSLOOKUP          Yes
> 
> TCPPort            1500                  TcpAdminport       1500
> 
> HTTPPort           1580                  TCPWindowsize      65536
> 
> TCPBufsize         16384                 TCPNoDelay         Yes
> 
> CommMethod         TCPIP                 CommMethod         ShMem
> 
> CommMethod         HTTP                  MsgInterval        1
> 
> ShmPort            1510                  FileExit
> /t01/log/eventserve-
>                                                              r(APPEND)
> 
> UserExit                                 FileTextExit
> 
> AssistVCRRecovery  Yes                   AcsAccessId        chrs144
> 
> AcsTimeoutX        1                     AcsLockDrive       No
> 
> AcsQuickInit       No                    SNMPSubagentPort   1521
> 
> SNMPSubagentHost   127.0.0.1             SNMPHeartBeatInt   5
> 
> TECHost                                  TECPort            0
> 
> UNIQUETECevents    No                    UNIQUETDPTECeven-  No
> 
>                                           ts
> 
> Async I/O          No                    Direct I/O         Yes
> 
> SHAREDLIBIDLE      No                    3494Shared         No
> 
> 
> 
> 
>          DATE_TIME           MSGNO     MESSAGE
> ------------------     -----------     ------------------
>         2005-10-03            9999     ANR9999D
>    22:02:43.000000                      imgroup.c(1180):
>                                         ThreadId<511>
>                                         Error 8
>                                         retrieving Backup
>                                         Objects row for
>                                         object
>                                         0.295703482
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003dea-
>                                         d4 imIsGroupLead-
>                                         er <- 0x00000001-
>                                         00385564
>                                         SmNodeSession <-
>                                         0x000000010043bb-
>                                         38 HandleNodeSes-
>                                         sion <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         40004)
>         2005-10-03            9999     ANR9999D
>    22:02:43.000000                      smnode.c(7056):
>                                         ThreadId<511>
>                                         Session 40004:
>                                         Invalid Group Id
>                                         0,295703482 for
>                                         ADD function
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003855-
>                                         8c SmNodeSession
>                                         <- 0x00000001004-
>                                         3bb38 HandleNode-
>                                         Session <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>         2005-10-03            8311     ANR8311E An I/O
>    22:37:12.000000                      error occurred
>                                         while accessing
>                                         drive SL8500
>                                         (/dev/rmt7) for
>                                         LOCATE operation,
>                                         errno = 78.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            1165     ANR1165E Error
>    22:37:13.000000                      detected for file
>                                         in storage pool
>                                         TAPE_ORACLE: Node
>                                         FJSU102, Type
>                                         Backup, File
>                                         space /p01, fsId
>                                         18, File name
>                                         /app/cyb/esp/MED-
>                                         -ESPSystemAgent/-
>                                         spool/CM_DEMO/MA-
>                                         IN/MEDAXBP.1366/
>                                         VMSDBSAP.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            3523     ANR3523W GENERATE
>    22:37:13.000000                      BACKUPSET:
>                                         Retrieve failed
>                                         - error on input
>                                         storage device.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            3503     ANR3503E
>    22:37:13.000000                      Generation of
>                                         backup set for
>                                         FJSU102 as
>                                         FJSU102_BACKUPSE-
>                                         T.295467854
>                                         failed. (SESSION:
>                                         29020, PROCESS:
>                                         680)
>         2005-10-03            2032     ANR2032E GENERATE
>    22:37:14.000000                      BACKUPSET:
>                                         Command failed -
>                                         internal server
>                                         error detected.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            9999     ANR9999D
>    22:50:53.000000                      imgroup.c(1180):
>                                         ThreadId<437>
>                                         Error 8
>                                         retrieving Backup
>                                         Objects row for
>                                         object
>                                         0.295537065
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003dea-
>                                         d4 imIsGroupLead-
>                                         er <- 0x00000001-
>                                         00385564
>                                         SmNodeSession <-
>                                         0x000000010043bb-
>                                         38 HandleNodeSes-
>                                         sion <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         39214)
>         2005-10-03            9999     ANR9999D
>    22:50:53.000000                      smnode.c(7056):
>                                         ThreadId<437>
>                                         Session 39214:
>                                         Invalid Group Id
>                                         0,295537065 for
>                                         ADD function
>                                         Callchain of
>                                         previous message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001003855-
>                                         8c SmNodeSession
>                                         <- 0x00000001004-
>                                         3bb38 HandleNode-
>                                         Session <-
>                                         0x00000001004419-
>                                         64 smExecuteSess-
>                                         ion <-
>                                         0x00000001004344-
>                                         78 SessionThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         39214)
>         2005-10-03             423     ANR0423W Session
>    22:51:42.000000                      41306 for
>                                         administrator  (
>                                         ) refused -
>                                         administrator
>                                         name not
>                                         registered.
>                                         (SESSION: 41306)
>         2005-10-03            8311     ANR8311E An I/O
>    22:52:14.000000                      error occurred
>                                         while accessing
>                                         drive SL8500
>                                         (/dev/rmt7) for
>                                         OFFL operation,
>                                         errno = 78.
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            8769     ANR8769E External
>    23:34:45.000000                      media management
>                                         function DISMOUNT
>                                         returned
>                                         result=LIBRARY_E-
>                                         RROR. (SESSION:
>                                         29020, PROCESS:
>                                         680)
>         2005-10-03            8469     ANR8469E Dismount
>    23:34:45.000000                      of LTO volume
>                                         T00897 from drive
>                                         SL8500
>                                         (/dev/rmt7) in
>                                         library SL8500
>                                         failed. (SESSION:
>                                         29020, PROCESS:
>                                         680)
>         2005-10-03            1410     ANR1410W Access
>    23:34:46.000000                      mode for volume
>                                         T00897 now set to
>                                         "unavailable".
>                                         (SESSION: 29020,
>                                         PROCESS: 680)
>         2005-10-03            9999     ANR9999D
>    23:46:25.000000                      ssremote.c(503):
>                                         ThreadId<136>
>                                         Unable to open
>                                         remote session of
>                                         type 1. Callchain
>                                         of previous
>                                         message:
>                                         0x0000000100017d-
>                                         94 outDiagf <-
>                                         0x00000001004a41-
>                                         78 ssInitStoreRe-
>                                         mote <-
>                                         0x000000010066ad-
>                                         10 AfInitStoreRe-
>                                         mote <-
>                                         0x00000001006670-
>                                         24 bfInitStoreRe-
>                                         mote <-
>                                         0x00000001006a05-
>                                         c4 DoBackup <-
>                                         0x00000001006a3e-
>                                         5c AdmBackupNode
>                                         <- 0x00000001001-
>                                         63168 AdmCommand-
>                                         Local <-
>                                         0x00000001001642-
>                                         ac admCommand <-
>                                         0x000000010015b1-
>                                         80 RunScript <-
>                                         0x000000010015cd-
>                                         30 DoRunScript <-
>                                         0x00000001001631-
>                                         68 AdmCommandLoc-
>                                         al <- 0x00000001-
>                                         001642ac
>                                         admCommand <-
>                                         0x000000010064ec-
>                                         58 SmExecSchedul-
>                                         edCommand <-
>                                         0x000000010064ee-
>                                         54 smScheduledCo-
>                                         nsoleSession <-
>                                         0x000000010064c8-
>                                         60 CsRunCmdThread
>                                         <- 0x00000001000-
>                                         08078 StartThread
>                                         <- 0x09000000002-
>                                         f4460 _pthread_b-
>                                         ody <-  (SESSION:
>                                         38192, PROCESS:
>                                         963)
>         2005-10-03            2032     ANR2032E BACKUP
>    23:46:25.000000                      NODE: Command
>                                         failed - internal
>                                         server error
>                                         detected.
>                                         (SESSION: 38192,
>                                         PROCESS: 963)
>         2005-10-03            1463     ANR1463E RUN:
>    23:46:25.000000                      Command script
>                                         NAS_2-DIFFERENTI-
>                                         AL completed in
>                                         error. (SESSION:
>                                         38192, PROCESS:
>                                         963)
>         2005-10-03            2752     ANR2752E Scheduled
>    23:46:25.000000                      command
>                                         NAS_2-DIFFERENTI-
>                                         AL failed.
>                                         (SESSION: 38192,
>                                         PROCESS: 963)
> 
> ********************************
> Joni Moyer
> Highmark
> Storage Systems
> Work:(717)302-6603
> Fax:(717)302-5974
> joni.moyer AT highmark DOT com
> ********************************
> 
> 
> 
>              "Richard Sims" 
>              <rbs AT bu DOT edu>
> To
>              10/04/2005 07:56          "Joni Moyer" 
>              AM                        <joni.moyer AT highmark DOT com> 
> cc
> 
> Subject
>                                        Re: HELP!!!! 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Joni - That's an error we haven't seen before.
> 
> Your best course of action is to call TSM Support.
> 
>    Richard Sims
> 
> On Oct 4, 2005, at 7:32 AM, Joni Moyer wrote:
> 
> > Has anyone ever seen this message before?  I have TSM 5.2.4 running 
> > on AIX
> > 5.2 and it seems like this error message occurred and then all 
> > processing stopped and it almost looks like TSM stopped & restarted 
> > itself.  Any suggestions are appreciated!!!!  I'm completely lost in

> > this situation.
> > Thank you in advance!
> >
> > 10/03/05 23:46:25     ANR9999D ssremote.c(503): ThreadId<136>
> > Unable to
> > open
> >                        remote session of type 1. Callchain of 
> > previous
> > message:
> >                        0x0000000100017d94 outDiagf <-
> > 0x00000001004a4178
> >
> >                        ssInitStoreRemote <- 0x000000010066ad10 
> > AfInitStoreRemote
> >                        <- 0x0000000100667024 bfInitStoreRemote <-
> > 0x00000001006-
> >                        a05c4 DoBackup <- 0x00000001006a3e5c 
> > AdmBackupNode
> > <-
> >                        0x0000000100163168 AdmCommandLocal <- 
> > 0x00000001001642ac
> >                        admCommand <- 0x000000010015b180 RunScript <-
> > 0x00000001-
> >                        0015cd30 DoRunScript <- 0x0000000100163168
> > AdmCommandLoc-
> >                        al <- 0x00000001001642ac admCommand <-
> > 0x000000010064ec58
> >                        SmExecScheduledCommand <- 0x000000010064ee54
> > smScheduled-
> >                        ConsoleSession <- 0x000000010064c860 
> > CsRunCmdThread
> > <-
> >                        0x0000000100008078 StartThread <- 
> > 0x09000000002f4460
> >
> >                        _pthread_body <-  (SESSION: 38192, PROCESS:
> > 963)
> > 10/03/05 23:46:25     ANR2032E BACKUP NODE: Command failed -
internal
> > server
> >                        error detected. (SESSION: 38192, PROCESS: 
> > 963)
> >
> > 10/03/05 23:46:25     ANR2753I (NAS_2-DIFFERENTIAL):ANR2032E BACKUP
> > NODE:
> >
> >                        Command failed - (SESSION: 38192)
> >
> > 10/03/05 23:46:25     ANR2753I (NAS_2-DIFFERENTIAL):internal server
> > error
> >
> >                        detected.  (SESSION: 38192)
> >
> > 10/03/05 23:46:25     ANR1463E RUN: Command script
NAS_2-DIFFERENTIAL
> > completed
> >                        in error. (SESSION: 38192, PROCESS: 963)
> >
> >
> >
> > ********************************
> > Joni Moyer
> > Highmark
> > Storage Systems
> > Work:(717)302-6603
> > Fax:(717)302-5974
> > joni.moyer AT highmark DOT com
> > ********************************
> >
> 
> 

<Prev in Thread] Current Thread [Next in Thread>