ADSM-L

More fun -

1997-02-28 21:32:40
Subject: More fun -
From: Jerry Lawson <jlawson AT THEHARTFORD DOT COM>
Date: Fri, 28 Feb 1997 21:32:40 -0500
Date:   February 28, 1997                       Time:    9:19 PM
From:   Jerry Lawson
        The Hartford Insurance Group
        (860) 547-2960    jlawson AT itthartford DOT com
---------------------------------------------------------------------------
--
--
If you read my note from earlier today on Copy Pools, it mentioned a
If you read my note from earlier today on Copy Pools, it mentioned a
damaged tape, and our fun with the Copy pool tape reclamation.
Unfortunately, sandwiched between those two events, we lost our MVS server.
 We have been running an MVS server for 3 years and had a grand total of
one failure.  Now we have had 3 in 10 days.  They have all been similar.
Unfortunately, ADSM has been running so well, we didn't realize that we
didn't have a dump dd statement in the proc, and so we have no real
documentation from failures.  Here is what I saw this morning.....

We log on to the admin client (from either OS/2 or Win95, GUI or Command
Line) and we can do things - check sessions, processes, stgpool  status,
etc.  When we check sessions, we find backups that should have been done
hours ago still active.  They show that they had done a good deal of
activity, but don't seem to be doing anything.  Other sessions are coming
and going (such as scheduled sessions checking in), but some sessions are
obviously hung.  For example, if an admin checks a volume (Q vol) no
response is returned, but the Q Session shows the status as "run".  If we
cancel the session, it doesn't go away.  Sometimes the Win95 machines get
hung, and we must reboot.

When we check on processes, I find that there are many running.  I find
that the expiration that kicks off nightly at 9:00 is still active.  It
normally ends about 1:00.  Also, the DASD  pool appears to have filled up,
and a migration is underway.  (This is not our scheduled migration, which
occurs at 3:00 in the afternoon).  There is also a backup of the same
copypool running.

My expectation is that the problem lies here.  The problem is that we
cannot find the root of the problem and cancel it - canceled processes do
not end.  We see mounted tapes, but nothing waiting for a mount, and no
embraces.  Therefore I only can imply the failure is here from the sessions
that die on the Q Vol commands, and the fact that we are migrating and
backing up the same pool at the same time.

Anyone have any ideas?  As I said earlier, this is at level 2.1.0.7. We
want to get to level 12, but that code is on a maintenance upgrade that's
being held up due to a microcode problem somewhere.  TCP/IP is at level
3.1.

***************************************************************************
**
Jerry Lawson
The Hartford Insurance Group
jlawson AT thehartford DOT com

Any idiot can face a crisis.  It's the day to day stuff that really wears
you down.

                                                Anton Chekov
<Prev in Thread] Current Thread [Next in Thread>
  • More fun -, Jerry Lawson <=