MVS SERVER goes to sleep.....

Thanks for the indepth response to my original question.  It verifies some of
our original premise assumptions, although it does not solve our problems.

As you explained - we were stuck on the SYSZTIOT engueue.  The use of the term
"Grinding halt" was probably an over dramatization - it would seem that ADSM
takes several hours to come to the condition we saw.  I see several occurences
in the log where reclaims start, run for a while, and then stop for several
hours, before taking off again.  I assume a similar problem each time, but the
effect to ADSM seems to be minimal.

Your first paragraph has caused some other issues to come to mind.  you said:
 >  Since your device class has a mountlimit
    of 4 ADSM is smart enough to not have to wait for the rewind/unload on
    one drive before attempting to mount a tape on another drive.  If you
    already had 4 drives in use then ADSM would have to wait for the dismount
    of the tape before attempting to mount the next tape.

We did not exactly see the effects of this in the log.  First , while the ADSM
messages indicated that we needed the three reclaim tapes, there were only MVS
mount messages for 2 tapes - could this be because we only had two drives
available?  Secondly, the third "GET" message was not issued until after the
first tape was read.  If the design goal had been to minimize lost time
between tapes, then why not have the tapes mounted at the beginning of the
request.

It would seem that the situation we encountered could also happen in the
following scenario - We have two processes running to handle the migration of
data from the storage pool to the tape pool.  A reclaim kicks off as we saw
here.  It gets two drives (mount limit is set at 4 and we are now using 4
drives.)  In the quote I cited above, wouldn't the same situation occur?

We are torn between a couple of internal forces here that causes us great
concern.  We can't leave the ADSM tapes in the silo forever, because of the
low reference counts on many of the tapes.  As I stated in the first note - we
have 4 JES3 complexes, and share resources such as tape drives between them to
keep computing costs lower.  To fully satisfy ADSM in all conditions, my mount
limit of 4, would really require 8 drives be available at all times - 4 inside
the silo, and 4 outside, since we never can be sure of where mount requests
will need to be covered.  At the same time, I don't expect (nor want) to have
to tell ADSM that tapes are inside or outside of automation facilities.  I
also do not want to reduce the maxcount to 2, since as I stated earlier, we
run parallel migration tasks to move data - my pool is at 7.5GB and growing -
will probably soon need to add a third migration task to insure timely
movement.  ANd I would like to see parallel reclaim tasks too - but that
apparently is not supported currently.

My own personal opinion here is that minimizing the use of tape resources here
is desireable - I am willing to forego the delay on a rewind/dismount for the
reduced hardware consumption it offers.  Afterall, this is a reclaim, not a
restore request from a customer.

Another thing that might help is some warning to the operator about what is
 happening.  For example - if there is an enqueue outstanding for 5 minutes,
such as happened here, letting the operator know about it would be a help.  We
are adding automation to look for ADSM sitting in the allocate queue, but this
is not the best solution - if we look too often, we run the risk of being
"Chicken Little", but if we look too long.between searches, an outage still
may occur.

This is getting pretty long, so I'll stop here.  There are lots of twists and
turns to this problem - MVS, JES3, my operational issues, and then what
happens in a JES2 environment.  I appreciate the time to help me on this one.

Jerry Lawson
jlawson AT itthartford DOT com

________________________Forward Header________________________
Author: INTERNET.OWNERAD
Subject: MVS SERVER goes to sleep.....
07-11-96 01:27 AM

Hello Jerry:

You have done an excellent job of detective work.  I think I can shed a
little light on this situation.

First I would like to cover the issue about why 3 or 4 drives were used
when there should only be 1 input and 1 output device.  Maybe ADSM has
outsmarted itself on this one.  Since your device class has a mountlimit
of 4 ADSM is smart enough to not have to wait for the rewind/unload on
one drive before attempting to mount a tape on another drive.  If you
already had 4 drives in use then ADSM would have to wait for the dismount
of the tape before attempting to mount the next tape.

Sure you could make a case for only asking for two drives but if ADSM
only used 2 drives for that reclamation and some other server process
needed tapes that were also outside of the SILO you could see the
same problem.

Why did ADSM come to a grinding halt?  You probably already know the
problem was because of the SYSZTIOT enqueue that was outstanding for
the dynamic allocation of the third tape that was waiting for a drive
to become available.  This in itself does not cause the server to come
to a grinding halt but either locks held by the process caused other
server threads to wait behind the process or the main thread needed
to update a flatfile (i.e. devconfig, volhistory, files used with a
device class type of FILE, etc... but not one of the VSAM files used by
ADSM) because that main thread would now also be waiting for the SYSZTIOT
enqueue.  The reason this same problem exists in the case where ADSM
would only use 2 tape drives for the reclamation but need another tape
drive outside of the SILO for another process is because the allocation
will be holding the SYSZTIOT enqueue while waiting for a device and
I believe a unallocate also needs the SYSZTIOT enqueue.

Take a look at APAR PN84364 that I opened for another customer.  I think
this describes the cause of your problem.  The APAR is opened as a DOC
APAR but development is considering code changes.

Another concern was about ignoring the tape drives in the SILO.  I am not
an expert in this area but I assume you have allocation exits that cause
the drives in the SILO to be ignored if the tape does not exist in the
SILO inventory.  Maybe someone with more expertise in this area can
confirm this.  Just to assure you this is all transparent to ADSM.  ADSM
just issues the dynamic allocation and MVS takes over from there to allocate
a drive and mount the tape.

I hate to say this but it is currently working (broken?) as designed.
You might want to track that APAR to see if a code change is eventually
 made.  The only circumvention I can suggest is to make sure there are as
many tape drives outside the silo available as the mountlimit setting
in the device class.

Hope this helps - David Bohm, ADSM technical support