ADSM-L

MVS SERVER goes to sleep.....

1996-07-10 15:50:59
Subject: MVS SERVER goes to sleep.....
From: Jerry Lawson <jlawson AT ITTHARTFORD DOT COM>
Date: Wed, 10 Jul 1996 15:50:59 -0400
Date:     July 10, 1996            Time:    14:31
From:    Jerry Lawson
    ITT Hartford Insurance Group
    (203) 547-2960    jlawson AT itthartford DOT com
-----------------------------------------------------------------------------
We have recently run into a strange occurrence on our MVS server - was
We have recently run into a strange occurrence on our MVS server - was
wondering if anyone else had a similar problem, or any suggestions.

We have a DASD storage pool that migrated to a tape storage pool that actually
covers both automated and non-automated devices.  The theory is this - A new
scratch mount will always go to the Silo.  ADSM will access the volume without
manual intervention.  However, because we already have over a thousand tapes
in the pool, and it is growing largely, and over 70% of the volumes are not re
ferenced very often, we let the Silo routines eject the tapes if they have not
been accessed in over a predetermined period, which is usually 30 days.  The
tapes are kept on a rack next to a set of manual drives, and operators are
available to handle mounts if needed.  (We do not have an excess of slots in t
he silos, and management has judged this to be a cost effective way of using
the silo.) The MVS unit information for the deviceclass allows the tapes to be
mounted in any available drive, and so once ejected, the tape is mounted on a
manual drive.  ADSM normally has no problem with this, and the operators are
generally awake and can read the displays on the drives fast enough to give us
acceptable mount times.

As a side note, the devclass statement has a mountlimit of 4, and a mountwait
of 0.

Lately, we have seen situations where ADSM comes to a grinding halt for
periods of time (no activity, no sessions can be established, etc.), and then
all of a sudden, it will let go and fly.  Analysis of the log shows the
following scenario.

1.  ADSM starts a reclaim on a tape that is outside of the silo.  In the
latest example, 4 volumes were needed to do the reclaim - it appeared that a
file spanned the beginning and end of the tape to be reclaimed, thus needing 3
input tapes, and a 4th tape was needed for output.  It was an older (non-scrat
ch) tape that had also been ejected for inactivity.  ADSM issued the standard
reclaim, messages, and then MVS mount messages were issued for the file with
the first part of the spanned record, and the output tape.  The operator
promptly mounted the tapes, and processing proceeded.

It is important to note that only 2 manual drives were available at this point
in time.  The silo, of course, had many available drives.

2.  After about 3 minutes, ADSM finished with the first output tape, and
issued a "GET" message (actually MVS did) for the second input tape - the one
to be reclaimed.  At this point, the reclaim process began to wait.  (It was
9:00 at night).

3.  At 7:00 the next morning, the operator varied on two more tape drives.  By
this time, ADSM was no longer processing, but there had been no external
messages to indicate a problem.  The operator had issued an I S A command (we
are JES3) and found that ADSM was in the allocate queue with DYN as a type -
indicating of course dynamic allocation.  This is why he varied on the drives.


4.  As soon as the drives became available to MVS, the first message was a
"keep" from ADSM/MVS, followed by a Mount message on one of the new drives
that had been varied on.  The tapes were mounted and the reclaim completed
normally.  Because of the Enqueue held by Dynamic Allocation, all of ADSM's
processing had eventually become "stuck" behind it, and all processing had
effectively ceased up until this point.  It now was freed and processing
 picked up, but several schedule windows had been missed during the 10 hour
outage.

Now it would be easy to pick on the operator here, but I don't think that is a
fair assessment of the problem.  In our shop, the single operator must watch a
combined console for 4 MVS/JES3 systems, and the one that ADSM runs on is a
"development" system that is only active during the day.  This happened also
during month end processing when resources such as tape drives are at a
premium, and were needed elsewhere.

My analysis of the scenario is that there is a bug in ADSM here.  To be
specific, the two tapes were mounted, and the input file was read to end of
reel.   At this point, the second volume was needed, but the first tape was
not closed (no "keep" issued.  My conclusion was that an open had been issued
(the Get); this is out of sequence.  Had the first input file been closed, the
drive would have been available for the second mount; in this case it is not.
Normal MVS EOV processing does not require a multiple reel file to be mounted
across multiple drives; they are mounted across the same device.  The same
should happen here.  I suspect that part of the problem was with the
mountlimit being set at 4 when only 2 manual drives are available.  But
remember that we had drives available to us in the silo; from an MVS standpoin
t they are all in the same unit type, although a mount for a tape that is
stored outside the silo will not be made on a drive in the silo.  This should
be transparent (my favorite word) to ADSM.

What do you all think?




*****************************************************************************
Jerry Lawson
ITT Hartford Insurance Group
jlawson AT itthartford DOT com

Any idiot can face a crisis.  It's the day to day stuff that really wears you
down.

                        Anton Chekov
<Prev in Thread] Current Thread [Next in Thread>