ADSM-L

Re: 3494 Volume Stealing

2002-03-20 00:07:36
Subject: Re: 3494 Volume Stealing
From: "Seay, Paul" <seay_pd AT NAPTHEON DOT COM>
Date: Wed, 20 Mar 2002 00:00:37 -0500
Allen,

I do not normally boast about my knowledge on something but in this area of
this product I know it as well as the engineers do and probably better when
considering its interaction with host systems.  I have actually run the
traces and I can tell you that the buffers returned from the 3494 during an
inventory function are in error in some cases because of an internal bug in
the LM.  This bug can cause software that does no error checking to get all
balled up and do something wrong.  It is kind of like adding 1 + 1 and
expecting to always get 2.  The way the LM inventory search facility works
(TSM uses this) is it passes back 100 tapes at a time.  The first buffer has
a header on it that has the total count of entries.  The last buffer is what
is left.  Unfortunately, the buffer is not cleared, so there is residual
data in the buffer from previous calls.  The last entry in the last buffer
has a null entry in it.  IBM never tested the count always being right.
And, in fact determined they have no way to guarantee the count is correct
because they just get the count from a DB2 table on the LM, they do not
count the number of entries they are returning.  So, any application that
does not scan for the null entry can end up picking up bogus information.
Note that these buffers can contain all tapes for all hosts, not just the
ones for a specific category.  The category is in the records returned.
Usually, the counter is short and you are missing tapes, but I suppose it
could be high and cause the logic to process volumes that are not yours.

Just food for thought.  There is no user code in my environment.

In the case I described below, there was only one host attached to this
library and it was non-mixed.  We just kept getting 2 tapes or 4 tapes short
of what was actually in the library.  If you read the 3494 programmers guide
you will find that there is a lot more to this than meets the superfical
mtlib command.  There is a whole set of c routines to allow you to code to
the lmcpd interface.  In fact, I am in the process of designing an insert
and categorize function for TSM.  Then, I will have some user code.  It will
work like the mainframe, automatically categorize the tapes and check them
in.  The way I am going to do this is execute a high level language perl
script that can be recoded easily that will do the necessary mtlib and TSM
commands.  This will limit the unsolicited messages processing in the coded
routine significantly.

I know the site that is having this problem with the 4 TSM Servers.  This is
a tight environment.  Another site that has an MVS and TSM environment had
an MVS tape eaten.  Both are in the process of reproducing this problem at
will.

This may still be a user problem, but I am betting either TSM code or the
Library Manager.

What everyone is asking for is security by the attaching node as to which
tapes can be acted upon by volume range.  This was discussed at length at
Share related to TSM because no other libraries have the kind of smarts the
3494 or ACSLS have to even do this in hardware.  The position of the
customers including us is that the 3494 is an enterprise class, high
integrity, high dollar product.  Yes, the functionality is not there in the
LM, but the amount of code to do a table check of valid ranges is small.
Each host has to be identified to the 3494 library, why not add the valid
ranges and categories too.  The big sell for IBM to do this is if I could
secure a library at this level then I could share it between many
environments like the Shark Disk LUN masking allows me.  In other words
consolidate many libraries into one.  Yes, a host could still mount its own
tapes in the wrong drive, that is its problem as you say or set the hardware
scratch mount category on the wrong drive to its own.  These are difficult
to solve in the library design.  But, to allow any system to checkin a FF00
tape or change the category of a tape owned by another host is a problem.
By default, what IBM should do is make it work the way it does now.  All
systems can do anything to "*".  Then, you can lock it down if you want.

I am going to be discussing this issue with 3494 hardware engineering in a
couple weeks hopefully.  It has always been an issue for us, but now that
people are having these kind of problems, I have sturdier ground to stand on
to get IBM to provide the functionality.  And why are they having these
problems, because everyone wants the reliability of the 3590 platform, have
this library already installed and say why not.