ADSM-L

Re: Solaris server stops working

1996-07-23 08:57:25
Subject: Re: Solaris server stops working
From: Bradley King <king AT MONTROUGE.GM.SLB DOT COM>
Date: Tue, 23 Jul 1996 14:57:25 +0200
Here is a rather lengthy message  but I thought all the info possible
might help.  I communicated my dead-lock problem, which I "solved"
by reinstalling the server via dumpdb, install, loaddb, auditdb.  Now
the server does not lock-up but there are errors which have no solution
e.g. unable to delete a volume which was a disk file that no longer exists.
the worst problem is that I can no longer do backups. I get a server
error for several unix clients. So I have no backups for over a week.
The "query actlog" of the current server config is at the very end of this
message . If there is any fix that could help I would greatly appreciate it.

The only other solution that I can currently think of is to delete everything
having to do with adsm from the system, reinstall and start the
configuration from scratch and hope whatever happened never happens
again.


>The problem you ran into is a dead-lock problem, that can be hit
>during certain states of the database and log, usually lots of
>activity or recovery causing large numbers of log pages to be
>processed. This has been fixed and will be available in few weeks when
>the testing is complete.
>If this problem returns and persists please contact me I can make a
>fix-test version of the PTF available.
>
>Jim Riehl
>Tucson ADSM Sun Server Developer
>
>Bradley King writes:
> > If anyone wonders, I solved my own problem.  I'm not sure it was the
> > best way, but it worked. I did:
> >
> > dsmserv dumpdb
> > dsmserv install
> > dsmserv loaddb
> > dsmserv auditdb
> > dsmserv
> >
> > which now works again.  Everything went fine, but it sure would have been
> > nice if the server had given me some vague message somwhere about what was
> > wrong. I certainly hope this is not a magic 32 day ritual!
> >
> > >At 07:24 AM 7/17/96 -0500, you wrote:
> > >>I have a very confusing problem. (at least for me)  I have been running
> > >>a Solaris 1.2.0 adsm server for about a month now.   I wanted to add a 
> > >>device,
> > >>which I think I did successfully. At some point on the same day the server
> > >>stopped communicating with anyone. I killed and restarted it to no avail
> > >>I rebooted the server to no avail. Since I have a respawn in the inittab
> > >>for the server and several schedule clients I rebooted the Sun after 
> > >>changing
> > >>the inittab. I then launched the dsmserv by hand it says:
> > >>
> > >>ADSTAR Distributed Storage Manager for Sun Solaris
> > >>[1] 403
> > >># Version 1, Release 2, Level 0.0/1.0
> > >>
> > >>Licensed Materials - Property of IBM
> > >>
> > >>5765-303 (C) Copyright IBM Corporation 1990, 1994. All rights reserved.
> > >>U.S. Government Users Restricted Rights - Use, duplication or disclosure
> > >>restricted by GSA ADP Schedule Contract with IBM Corporation.
> > >>
> > >>ANR7800I DSMSERV generated at 16:24:47 on Mar 14 1995.
> > >>ANR7801I Subsystem process ID is 403.
> > >>ANR0900I Processing options file dsmserv.opt.
> > >>ANR0990I ADSM server restart-recovery in progress.
> > >>ANR0200I Recovery log assigned capacity is 28 megabytes.
> > >>ANR0201I Database assigned capacity is 224 megabytes.
> > >>ANR0306I Recovery log volume mount in progress.
> > >>ANR0353I Recovery log analysis pass in progress.
> > >>ANR0354I Recovery log redo pass in progress.
> > >>ANR0355I Recovery log undo pass in progress.
> > >>ANR0362W Database usage exceeds 92 % of its assigned capacity.
> > >>
> > >>
> > >>then it just sits there w/o using cpu or allowing any connections. There
> > >>are no errors in the log file, none on the screen. Is there anything that
> > >>can be done? Since the server gives no indication of errors and there
> > >>is no way to communicate with it I don't know what to do next.
> > >>
> > >>They typical resonse to an attempt to connect is as follows:
> > >>>dsmdsmc
> > >>ADSTAR Distributed Storage Manager
> > >>Command Line Administrative Interface - Version 2, Release 1, Level 0.3
> > >>(C) Copyright IBM Corporation, 1990, 1996, All Rights Reserved.
> > >>
> > >>Enter your user id:  admin
> > >>ANS5658E TCP/IP failure.
> > >>ANS5519E Unable to establish session with server.
> > >>
> > >>ANS5103I Highest return code was -50.
> > >>>
> > >>

Here is the output of query actlog:

Date/Time              Message
--------------------   
----------------------------------------------------------
07/23/1996 13:28:01    ANR1305I Disk volume /home/gm-share/adsm/stgpool.0 varied
07/23/1996 13:28:01    ANR1305I Disk volume /home/gm-share/adsm/stgpool.0 varied
                        online.
07/23/1996 13:28:03    ANR2560I Schedule manager started.
07/23/1996 13:28:39    ANR2225W Discard Data process terminated for volume
                        /home/gm-share/adsm/stgpool.1 - volume still contains
                        data.
07/23/1996 13:28:59    ANR9999D blkdisk.c(903): Error opening disk
                        /home/gm-share/adsm/stgpool.1.
07/23/1996 13:28:59    ANR1311E Vary-on failed for disk volume
                        /home/gm-share/adsm/stgpool.1 - unable to access disk
                        device.
07/23/1996 13:29:16    ANR2225W Discard Data process terminated for volume
                        /home/gm-share/adsm/stgpool.1 - volume still contains
                        data.
07/23/1996 13:29:44    ANR0400I Session 1 started for node CLIENT_TOURNESO
                        (HPUX).
07/23/1996 13:29:45    ANR0403I Session 1 ended for node CLIENT_TOURNESO (HPUX).
07/23/1996 13:31:08    ANR0811I Inventory client file expiration started as
                        process 1.
07/23/1996 13:31:08    ANR2803I License manager started.
07/23/1996 13:31:08    ANR8200I TCP/IP driver ready for connection with clients
                        on port 1500.
07/23/1996 13:31:08    ANR8439I SCSI library ROBOT is ready for operations.
07/23/1996 13:31:08    ANR0993I ADSM server initialization complete.
07/23/1996 13:31:09    ANR2835I Server is licensed for 41 clients.
07/23/1996 13:31:09    ANR2843I Server is licensed to support UNIX clients.
07/23/1996 13:31:09    ANR2844I Server is licensed to support clients other than
                        UNIX.
07/23/1996 13:31:09    ANR2854I Server is licensed for device support module 1.
07/23/1996 13:31:10    ANR0874E Backup object 0.130548 not found during 
inventory
                        processing.
07/23/1996 13:31:10    ANR0865E Expiration processing failed - internal server
                        error.
07/23/1996 13:31:10    ANR0860E Expiration process 1 terminated due to internal
                        error: deleted 0 backup files and 0 archive files.
07/23/1996 13:31:10    ANR2560I Schedule manager started.
07/23/1996 13:32:36    ANR0400I Session 1 started for node CLIENT_TINTIN (HPUX).
07/23/1996 13:32:36    ANR0403I Session 1 ended for node CLIENT_TINTIN (HPUX).
07/23/1996 13:32:50    ANR0400I Session 2 started for node CLIENT_MILOU (HPUX).
07/23/1996 13:32:50    ANR0402I Session 3 started for administrator ADMIN
                        (SunOS).
07/23/1996 13:32:51    ANR0403I Session 2 ended for node CLIENT_MILOU (HPUX).
07/23/1996 13:32:59    ANR0400I Session 4 started for node CLIENT_ULTRA (SunOS).
07/23/1996 13:32:59    ANR0403I Session 4 ended for node CLIENT_ULTRA (SunOS).
07/23/1996 13:33:30    ANR2225W Discard Data process terminated for volume
                        /home/gm-share/adsm/stgpool.1 - volume still contains
                        data.
07/23/1996 13:34:17    ANR0400I Session 5 started for node RIVIERE (Mac).
07/23/1996 13:34:20    ANR0403I Session 5 ended for node RIVIERE (Mac).
07/23/1996 13:35:05    ANR0400I Session 6 started for node CLIENT_MILOU (HPUX).
07/23/1996 13:35:47    ANR0400I Session 7 started for node CLIENT_HADDOCK 
(HPUX).
07/23/1996 13:35:48    ANR0403I Session 7 ended for node CLIENT_HADDOCK (HPUX).
07/23/1996 13:36:01    ANR8337I 4MM volume A0796 mounted in drive TAPE
                        (/dev/rmt/2mt).
07/23/1996 13:47:27    ANR0102E imbkins.c(1233): Error 1 inserting row in table
                        "Expiring.Objects".
07/23/1996 13:47:27    ANR0530W Transaction failed for session 6 - internal
                        server error detected.
07/23/1996 13:47:28    ANR0400I Session 8 started for node CLIENT_MILOU (HPUX).
07/23/1996 13:47:28    ANR0403I Session 6 ended for node CLIENT_MILOU (HPUX).
07/23/1996 13:51:51    ANR0482W Session 3 terminated - idle for more than 15
                        minute(s).
07/23/1996 13:54:08    ANR0400I Session 9 started for node KING (Mac).
07/23/1996 13:54:12    ANR0403I Session 9 ended for node KING (Mac).
07/23/1996 13:55:06    ANR0402I Session 10 started for administrator ADMIN
                        (SunOS).
07/23/1996 13:55:25    ANR0102E imbkins.c(1233): Error 1 inserting row in table
                        "Expiring.Objects".
07/23/1996 13:55:25    ANR0530W Transaction failed for session 8 - internal
                        server error detected.
07/23/1996 13:55:26    ANR0403I Session 8 ended for node CLIENT_MILOU (HPUX).
07/23/1996 13:55:26    ANR0400I Session 11 started for node CLIENT_MILOU (HPUX).
07/23/1996 13:59:12    ANR0102E imbkins.c(1419): Error 1 inserting row in table
                        "Expiring.Objects".
07/23/1996 13:59:12    ANR0530W Transaction failed for session 11 - internal
                        server error detected.
07/23/1996 13:59:13    ANR0403I Session 11 ended for node CLIENT_MILOU (HPUX).
07/23/1996 13:59:13    ANR0400I Session 12 started for node CLIENT_MILOU (HPUX).
07/23/1996 14:05:02    ANR0102E imbkins.c(1419): Error 1 inserting row in table
                        "Expiring.Objects".
07/23/1996 14:05:03    ANR0530W Transaction failed for session 12 - internal
                        server error detected.
07/23/1996 14:05:04    ANR0403I Session 12 ended for node CLIENT_MILOU (HPUX).
07/23/1996 14:05:04    ANR0400I Session 13 started for node CLIENT_MILOU (HPUX).


   + --------------------------------------------- +
   |  Bradley King          Fax:(331) 47.46.72.12  |
   |  Schlumberger Industries - Gas Engineering    |
   |  50, ave Jean Jaures, 92542 Montrouge France  |
   + --------------------------------------------- +
<Prev in Thread] Current Thread [Next in Thread>