ADSM-L

Re: TapeAlerts on AIX and 3584 library

2004-08-25 09:24:48
Subject: Re: TapeAlerts on AIX and 3584 library
From: David Longo <David.Longo AT HEALTH-FIRST DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 25 Aug 2004 09:24:47 -0400
I would suspect this is a firmware problem on the 3584.
With my 3584 and LTO1 Fibre drives early on I had a problem
where errors were being logged in AIX errpt and occasionally a tape
left in drive.  I discovered it was only happeneing on drive where
control path was.  IBM finally said that indeed this was the problem.

About 9 months or more ago, new firmware fixed the problem for me.

(This tape stuck problem was different from the occasional physical
problem related to the case of the LTO tapes).


David B. Longo
System Administrator
Health First, Inc.
3300 Fiske Blvd.
Rockledge, FL 32955-4305
PH      321.434.5536
Pager  321.634.8230
Fax:    321.434.5509
david.longo AT health-first DOT org


>>> jurjen-tsm AT STUPENDOUS DOT ORG 08/25/04 03:11AM >>>
Hi there,

Some time ago, someone mentioned that many TapeAlerts were appearing
in
the log. Some/many/all of those alerts were "weird", in the sense that
the
error was e.g. that a firmware update failed, while no firmware update
was
in progress.

For some time now (a few months), we're experiencing similar symptoms.
We
get a daily dose of TapeAlerts of varying importance. The most serious
ones say things like "The tape <XXX> is not data-grade", "Your data is
at
risk", "The tape has snapped", "There is a problem with the library
mechanism", but we also get less serious ones that say things like:
"The
firmware update failed", "The drive <xxx> is outside allowed
temperature
range", etc. I've read the documentation about TapeAlerts, and as far
as I
can tell we've had them all.

This happens on a 3584 library with firmware version 4090 and four
LTO2
drives with firmware level 4770. The problem also occured on every
earlier
level we've had (the library is in our possession since the beginning
of
the year). The drives are direct-attached to four 6778 fibrechannel
HBAs
in a 7026-M80 running AIX 5.2 ml2 with the latest Atape (but here too
the
problem occured with earlier levels).

The strange thing is that this problem only occurs on the drive of
which
the controlport is used. Normally, we use one controlport, and that is
/dev/smc0 attached to /dev/rmt0: the standard (and required) 3584
controlport. During troubleshooting, another controlport was created
on
another drive, which appeared as /dev/smc1 in AIX. TSM was configured
to
use /dev/smc1 instead of /dev/smc0, and from that moment the
TapeAlerts
were logged against /dev/rmt1 and /dev/smc1. So, only the drive of
which
the controlport is in use, complains about faulty tapes and/or
operating
conditions etc. The same tape in any other drive doesn't cause any
tapealerts.

These TapeAlerts *seem* to be only cosmetic errors, because no other
errors are logged at all. No I/O errors or anything in TSM, and
nothing
whatsoever in the AIX error report. Even the drivelog on the library
doesn't say anything weird, as far as I can tell. (Note that I say
*seem*,
because in the past tapes *were* corrupted by (probably) this issue.
More
correctly: the tape's memory had to be recreated by reading it
entirely
using tapeutil. This corruption seemed to have gone away either because
of
a drive replacement or a higher firmware level.)

I already have a call open with IBM. That call is progressing at
glacial
pace however, and they make me jump through several hoops. So, I'm
interested in any experiences of other people.

Thanks for reading this far. :-)
--
Jurjen Oskam
"I often reflect that if "privileges" had been called
"responsibilities" or
"duties", I would have saved thousands of hours explaining to people
why
they were only gonna get them over my dead body." - Lee K. Gleason, VMS
sysadmin

##############################################################
This message is for the named person's use only.  It may
contain confidential, proprietary, or legally privileged
information.  No confidentiality or privilege is waived or
lost by any mistransmission.  If you receive this message
in error, please immediately delete it and all copies of it
from your system, destroy any hard copies of it, and notify
the sender.  You must not, directly or indirectly, use,
disclose, distribute, print, or copy any part of this message
if you are not the intended recipient.  Health First reserves
the right to monitor all e-mail communications through its
networks.  Any views or opinions expressed in this message
are solely those of the individual sender, except (1) where
the message states such views or opinions are on behalf of
a particular entity;  and (2) the sender is authorized by
the entity to give such views or opinions.
##############################################################

<Prev in Thread] Current Thread [Next in Thread>