ADSM-L

Re: 3494 Volume Stealing

2002-03-20 17:04:18
Subject: Re: 3494 Volume Stealing
From: Allen Barth <allen_barth AT SCUDDER DOT COM>
Date: Wed, 20 Mar 2002 15:59:57 -0600
Paul,

It's clear to me that we disagree.  I understand and accept the
inter-operating parameters and requirements of the 3494, and have
installed and/or written code residing on each attached host to only look
for tapes that it owns.  I don't do scans - period.  I think your
reference to the Shark disk server capabilities is an apples to oranges
comparison.  For one, Shark lun inventories are a stable commodity (unless
more dasd is added and LSS's are created) in that they (luns) don't
checkin/checkout like tapes do in a tape library.  For another, YOU don't
get to pick what the luns are called, but YOU do get to pick what tape
volsers you'll use.

And what about VTS?  One of my 3494s has VTS enabled and the carts used by
the VTS function aren't owned or managed by ANY externally attached
system.  This has caused no impact to me.

Now I'm not saying that the 3494 is perfect.  There are definitely
functions that I'd like to see added.  For example, ever try to find out
the volser of a cart in cell xxx after the label fell off?  There's only 1
painful way....pull up a chair and slowly wade through the inventory on
the LM console until you find it.  There is no perfect solution to
disparate systems claiming tapes, but  the more responsibility you
off-load, the less flexibility you will get.

Regards,
Al Barth





"Seay, Paul" <seay_pd AT NAPTHEON DOT COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
03/19/02 11:00 PM
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Re: 3494 Volume Stealing


Allen,

I do not normally boast about my knowledge on something but in this area
of
this product I know it as well as the engineers do and probably better
when
considering its interaction with host systems.  I have actually run the
traces and I can tell you that the buffers returned from the 3494 during
an
inventory function are in error in some cases because of an internal bug
in
the LM.  This bug can cause software that does no error checking to get
all
balled up and do something wrong.  It is kind of like adding 1 + 1 and
expecting to always get 2.  The way the LM inventory search facility works
(TSM uses this) is it passes back 100 tapes at a time.  The first buffer
has
a header on it that has the total count of entries.  The last buffer is
what
is left.  Unfortunately, the buffer is not cleared, so there is residual
data in the buffer from previous calls.  The last entry in the last buffer
has a null entry in it.  IBM never tested the count always being right.
And, in fact determined they have no way to guarantee the count is correct
because they just get the count from a DB2 table on the LM, they do not
count the number of entries they are returning.  So, any application that
does not scan for the null entry can end up picking up bogus information.
Note that these buffers can contain all tapes for all hosts, not just the
ones for a specific category.  The category is in the records returned.
Usually, the counter is short and you are missing tapes, but I suppose it
could be high and cause the logic to process volumes that are not yours.

Just food for thought.  There is no user code in my environment.

In the case I described below, there was only one host attached to this
library and it was non-mixed.  We just kept getting 2 tapes or 4 tapes
short
of what was actually in the library.  If you read the 3494 programmers
guide
you will find that there is a lot more to this than meets the superfical
mtlib command.  There is a whole set of c routines to allow you to code to
the lmcpd interface.  In fact, I am in the process of designing an insert
and categorize function for TSM.  Then, I will have some user code.  It
will
work like the mainframe, automatically categorize the tapes and check them
in.  The way I am going to do this is execute a high level language perl
script that can be recoded easily that will do the necessary mtlib and TSM
commands.  This will limit the unsolicited messages processing in the
coded
routine significantly.

I know the site that is having this problem with the 4 TSM Servers.  This
is
a tight environment.  Another site that has an MVS and TSM environment had
an MVS tape eaten.  Both are in the process of reproducing this problem at
will.

This may still be a user problem, but I am betting either TSM code or the
Library Manager.

What everyone is asking for is security by the attaching node as to which
tapes can be acted upon by volume range.  This was discussed at length at
Share related to TSM because no other libraries have the kind of smarts
the
3494 or ACSLS have to even do this in hardware.  The position of the
customers including us is that the 3494 is an enterprise class, high
integrity, high dollar product.  Yes, the functionality is not there in
the
LM, but the amount of code to do a table check of valid ranges is small.
Each host has to be identified to the 3494 library, why not add the valid
ranges and categories too.  The big sell for IBM to do this is if I could
secure a library at this level then I could share it between many
environments like the Shark Disk LUN masking allows me.  In other words
consolidate many libraries into one.  Yes, a host could still mount its
own
tapes in the wrong drive, that is its problem as you say or set the
hardware
scratch mount category on the wrong drive to its own.  These are difficult
to solve in the library design.  But, to allow any system to checkin a
FF00
tape or change the category of a tape owned by another host is a problem.
By default, what IBM should do is make it work the way it does now.  All
systems can do anything to "*".  Then, you can lock it down if you want.

I am going to be discussing this issue with 3494 hardware engineering in a
couple weeks hopefully.  It has always been an issue for us, but now that
people are having these kind of problems, I have sturdier ground to stand
on
to get IBM to provide the functionality.  And why are they having these
problems, because everyone wants the reliability of the 3590 platform,
have
this library already installed and say why not.


<Prev in Thread] Current Thread [Next in Thread>