ADSM-L

3584 Tape Library issue (maybe)

2005-01-18 17:16:29
Subject: 3584 Tape Library issue (maybe)
From: Nathan Reiss <Reiss_Nathan AT CAT DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 18 Jan 2005 16:16:45 -0600
I got a weird problem.  This is a shot in the dark, but I'm going to ask
all the kind-folks on the Adsm mailing list a weird question.  Maybe
someone out there will have some ideas.

Please bare in mind that we do have a PMR open with TSM support, a service
call with IBM Hardware CE's, and with Brocade.

It appears that we've tracked the problem to the 3584's. but not positive
about that yet.

About every 2-4 days we have to power cycle all the drives, or at least a
good subset of them, in our 3584 libraries.  Both of the 3584's are four
frames each, and each has 27 LTO2 drives.   The tape SAN consists of two
Brocade M14's. Each drive is plugged into the M14 directly.  (Not into
another edge switch).  We thought that there might be a bad connection
between the two M14's, but we disabled the system boards that were giving
us some issues in both last week.  Then today the issue happened again.

The TSM Server (library managers) that run the two libraries are both at
v5.2.4, on AIX 5.2 ML4.   There are about 15 TSM library clients that talk
the respective library managers, as well as somewhere around 50 storage
agents as well.  All are current TSM levels.  We are at the latest firmware
on all the drives and the libraries as well now.    We are at Atape 8.4.9.0
.

Today, when the problem was happening we also (using the 3584's web
interface, so TSM was not involved) wouldn't eject tapes from the libraries
to empty slots from the tape drives in frame four on one of them.  It told
me that there weren't any empty slots to put the tape into.  But I could
move the tape from that drive, to drive 1 in frame 1, and then it would
eject it to an empty slot like normal.  It did this with five tapes.

The two things we have done to temp. fix the issue has been:

1. Restarting the TSM library Managers.
2A. Power cycling the drives.  Sometimes just the two drives that are the
control paths into the library,
2B.  and sometimes it appears to need every drive power cycled.

Since I was having trouble with ejecting tapes earlier and TSM was not
involved in that scenario, I am inclined to think that TSM really isn't
part of the root problem, but that it some how gets confused and needs to
be restarted at times in order to, shall we say,  clear its head.  Because
it seems to affect the library even when not talking to TSM or AIX, I don't
think upgrading the Atape driver to whatever 9.X.X.X version is out there
would fix the problem. But I'm open to arguments that say I'm full of it
there.

Does anybody out there have any ideas?

Thank you,

David N. Reiss
Unix/TSM System Engineer
Caterpillar, Inc.
(309)/494-3749

<Prev in Thread] Current Thread [Next in Thread>