ADSM-L

Survey results

1997-07-11 13:22:48
Subject: Survey results
From: Jerry Lawson <jlawson AT THEHARTFORD DOT COM>
Date: Fri, 11 Jul 1997 13:22:48 -0400
Date:     July 11, 1997            Time: 10:10 AM
From:     Jerry Lawson
          The Hartford Insurance Group
(860)  547-2960          jlawson AT thehartford DOT com
-----------------------------------------------------------------------------
As I promised on Monday, here are the results of the informal survey that I
As I promised on Monday, here are the results of the informal survey that I
ran.   I received 26 responses - all but one of them directly to me.  I sure
Martha and everyone else on the list thanks you for that.

Before I get on to the results, many people asked what kind of problems we
were experiencing.  I have been noticing some strange things going on ever
since we went from server level 7 to server level 12, approximately 3 months
ago.  The major problems occur when we start our month end processing - since
I work for an insurance company, we have a lot of "month end" work, including
a great amount of tape activity.  During the first month end with level 12,
we experienced very bad performance from ADSM.  Level 10 of the ADSM code
contained a change in the way MVS tape allocation was handled - primarily to
resolve the Enqueue lockouts that were causing ADSM to go into a wait state.
We saw many processes running concurrently, sometimes as many as 10.  Some of
these of course needed more than one tape device.  The results were that many
were waiting on mounts, some were waiting for mount points, some even waiting
for multiple mount points.  (At the time, we were also using 3480 technology
drives.)  It appeared that some of the processes were never getting any
mounts, even though processes that had started later had active mounts.  It
was almost as if some of the processes had been lost in the shuffle.  What
didn't help was that for some processes, when a mount request is issued, the
Q Process command returns a number that appears to indicate the number of
seconds a process has been active, not how long the mount has been
outstanding.  This to me is a bug.  Unfortunately, at this point we had a
camera failure within a silo where ADSM had mounted volumes, and we really
seemed to loose things.  We had to halt ADSM, and it did not come down easily
probably because of outstanding mounts.  This was after the silo had been
repaired and brought back on-line.

As you can see, outside of the mount request time, there is no "for certain"
bugs in the above description.  But there is a lot of uneasyness about the
new tape handling methods.  We decided to watch the problem through another
month end cycle.  Unfortunately, the May cycle (starting in June) was over a
weekend, which gives us better performance, and no problems were seen.  We
watched it for the June cycle (starting on July 1, a Tuesday), and saw the
same slow processing, the same erratic tape mounts for processes, but all
finished eventually.  There were no hardware problems, either.  Now to be
fair, I should also note that we are running on the edge of our resources at
this time.  First, CPU utilization is probably right at 100 %, and tape
drives are at a premium, with a queue of work eligible to run if drives were
available.  We run JES3, so these are jobs that are queued for execution, not
actually running.  It was this lack of drives being available that caused me
to ask the support center a question, and get back the response about
dedicate your drives.

I do know what I am uncomfortable with, though - primarily 2 things.  The
first is the new process of allocating drives.  If you need a drive now, and
none are available, then ADSM does not hang the Enqueue out there anymore,
instead the request is withdrawn, and 5 seconds later it is tried again.
This happens 30 times (5 minutes), and then a WTOR is issued (ANR5373),
asking if the operator wants to continue, or cancel the mount.  Most of these
requests were getting ignored because we had not told them what to do.  We
are now adding the automatic reply of continue through our ACO routines.
What I think happens, is that when ADSM gets a "no drives available"
response, and withdraws the dynamic allocation request, he drops out of
sight.  If a drive then becomes available, the next job in the queue that
needs a drive will get it.  Prior to the level 10 change, ADSM would have
been number 1 on the queue (because he was dynamic,), and so would have
sopped up the drive.  The change eliminated the outstanding Enqueue that
dragged ADSM down slowly when there would be no drive coming available, but I
am not sure that the new method is not flawed too.

The second problem has to do with the way drives are released when we are
through with them.  (We do not set a retain value, on our drives - as soon as
we are through with a tape, we release the drive.)  What currently happens is
(as long as you have not reached the mount limit), when a tape reaches an end
of volume state, the next volume mount is requested.  The old tape is not
unloaded at this time.  When the mount is satisfied, then the old tape is
released.  This is done for throughput - no waiting for the unload/rewind to
take place - just throw up the new drive and go.  On a 3480, if you are at
end of reel, this can be a great savings.  But as my survey shows, most
people are using drives with  serpentine writing techniques, which means that
at end of reel, the tape is already rewound - all you got to do is unload and
go.  If MVS has no drives available, then I wait, even though I am holding a
dive that is suitable to use for the next mount!  What ADSM needs to do at a
minimum is allow an installation to select the end of volume processing we
want - either use a new drive for the next mount, or use the existing one.
When you start to look at an MVS system with many ADSM tapes being mounted,
and no drives available, I think this makes a lot of sense.

OK - now for the survey results.  I will repeat the question, give my answer,
and then summarize the responses.

1.  Do you share tape drives with the rest of your MVS workload, or do you
dedicate tape drives to ADSM?

We share with all other MVS workloads.

Of 26 responses, only one customer dedicated drives, but he actually did both
shared and dedicated - he had a small 3494 library with 2 drives for some of
his ADSM tapes - he used it exclusively for ADSM, but also had ADSM using
pooled drives.

2.  If you dedicate, how many drives do you give ADSM?

See above.

3.  If you do not dedicate drives to ADSM, how high do you set the mount
limit in each DEVCLASS you have defined?

This question was poorly worded - what I meant to ask was what was the
combined mount limit for the MVS image where ADSM runs.  We have 3 devices
classes, the mount limit is 4,4, and 6 for the 3 classes, therefore we could
use as many as 14 devices at a time.

The responses were everywhere here - one person set his as high as 40, the
smallest was 2, but I got a range of mixed responses - some answered as I
intended, while others answered what I asked.

4.  What type of drives do you use (3480, 3490, 3590, or something else?)

     We use 3490 devices now, primarily (still 3480s available for migration
fallout, etc. )

Most people used 3490s.  A few had 3590 devices, and a couple were using
3480.   There was a mixture of IBM and STK devices.

5.  Do you use a robot?  If so - which one?

     We use STK 4400, with the powder horn robots in the silo.  (Powderhorn
is the one with two arms.)

Only 5 people were not using robots of any kind.  15 used the STK silos
(regular, enhanced, or wolfcreek), 6 used the 3494 (one had both a 3494 and a
silo) and one person had a 3595.

6.  If you have a robot, how many drives does it have?

We have 44 drives in our silo.

The answers here varied some - the STK silo users tended to have more drives
in the silo ( about 40 would be a good average), while the 3494 users have a
lower number - averaging about 5.

7.  How many tape drives do you have in total.

I wanted a total number of automated and manual drives, but from the answers
I got back, I think I should have worded this question more carefully.
Ranges for manual drives ranged from none to 40+.

8.  In round numbers, how much data do you back up each night?

I forgot to consider compression here, so these numbers may vary.  There was
also some question about how to measure this.  I didn't want people to go to
a lot of work here - just a well placed estimate.  For example, I have 2 DASD
pools which fill up each evening - the migration each night works out to
about 16.5GB, compressed.

If I assume that all the numbers that people gave me were compressed, then
the average backup was 24GB per night.  There were three people backing up
over 100GB a night, 4 between 50 and 100, 3 from 25 to 50, and 16 less than
25.

9.  Do you use the copypool features?

Yes

There were several people that replied that they used other methods of
providing for backup of the data, ranging from EXPORT to the RMM feature.
Only 6 did nothing for backup.  Some were selectively backing up, and at
least 2 were implementing it - they had started but had not finished the
creation of the initial copy yet.

In conclusion, I want to thank the 26 people who took the time to respond -
that's a lot more than I expected, and the results were very interesting to
say the least.
-----------------------------------------------------------------------------
                                                     Jerry
                                                     Jerry

Insanity is doing the same thing over and over..and expecting the results to
be different - Anon.
<Prev in Thread] Current Thread [Next in Thread>
  • Survey results, Jerry Lawson <=