ADSM-L

TapeAlerts on AIX and 3584 library

2004-08-25 03:10:49
Subject: TapeAlerts on AIX and 3584 library
From: Jurjen Oskam <jurjen-tsm AT STUPENDOUS DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 25 Aug 2004 09:11:32 +0200
Hi there,

Some time ago, someone mentioned that many TapeAlerts were appearing in
the log. Some/many/all of those alerts were "weird", in the sense that the
error was e.g. that a firmware update failed, while no firmware update was
in progress.

For some time now (a few months), we're experiencing similar symptoms. We
get a daily dose of TapeAlerts of varying importance. The most serious
ones say things like "The tape <XXX> is not data-grade", "Your data is at
risk", "The tape has snapped", "There is a problem with the library
mechanism", but we also get less serious ones that say things like: "The
firmware update failed", "The drive <xxx> is outside allowed temperature
range", etc. I've read the documentation about TapeAlerts, and as far as I
can tell we've had them all.

This happens on a 3584 library with firmware version 4090 and four LTO2
drives with firmware level 4770. The problem also occured on every earlier
level we've had (the library is in our possession since the beginning of
the year). The drives are direct-attached to four 6778 fibrechannel HBAs
in a 7026-M80 running AIX 5.2 ml2 with the latest Atape (but here too the
problem occured with earlier levels).

The strange thing is that this problem only occurs on the drive of which
the controlport is used. Normally, we use one controlport, and that is
/dev/smc0 attached to /dev/rmt0: the standard (and required) 3584
controlport. During troubleshooting, another controlport was created on
another drive, which appeared as /dev/smc1 in AIX. TSM was configured to
use /dev/smc1 instead of /dev/smc0, and from that moment the TapeAlerts
were logged against /dev/rmt1 and /dev/smc1. So, only the drive of which
the controlport is in use, complains about faulty tapes and/or operating
conditions etc. The same tape in any other drive doesn't cause any
tapealerts.

These TapeAlerts *seem* to be only cosmetic errors, because no other
errors are logged at all. No I/O errors or anything in TSM, and nothing
whatsoever in the AIX error report. Even the drivelog on the library
doesn't say anything weird, as far as I can tell. (Note that I say *seem*,
because in the past tapes *were* corrupted by (probably) this issue. More
correctly: the tape's memory had to be recreated by reading it entirely
using tapeutil. This corruption seemed to have gone away either because of
a drive replacement or a higher firmware level.)

I already have a call open with IBM. That call is progressing at glacial
pace however, and they make me jump through several hoops. So, I'm
interested in any experiences of other people.

Thanks for reading this far. :-)
--
Jurjen Oskam
"I often reflect that if "privileges" had been called "responsibilities" or
"duties", I would have saved thousands of hours explaining to people why
they were only gonna get them over my dead body." - Lee K. Gleason, VMS sysadmin

<Prev in Thread] Current Thread [Next in Thread>