ADSM-L

Re: ANR8311E I/O error

2000-02-22 16:49:42
Subject: Re: ANR8311E I/O error
From: Richard Sims <rbs AT BU DOT EDU>
Date: Tue, 22 Feb 2000 16:49:42 -0500
>I and my IBM CE have been pulling our hair out on this one for almost a
>week...

The initial question might be why he has any hair left to start with, but
we'll address that later.

>Whenever my 3494lib w/ 2x 3590 B1A drives mounts a tape (to do a
>migration, for example), it shortly produces a WRITE error (errorno=78),
>and marks the tape 'read-only'.  I'm running ADSM 3.1.2.40 on an H70
>running AIX 4.3.2 The atape, atldd, and 3590 microcode are to current
>levels.

When reporting such problems, please supply the original error messages, to
help us in perceiving the problem.  The errno 78 indicates a datacomm timeout;
but is this to the library or the drive?  We need the error message.
And what, if anything, is in the AIX error log about this?
Are the 3590s logging errors to the 3494 Library Manager to help the CE
identify the problem?
Think of what may have changed in the configuration recently (AIX patches,
drive microcode, etc.) which may influence the problem.

>This problem started as intermittent on one drive, and now has progressed
>to the point where I can't write anything to tape anymore on either drive.

This sounds like it's not "a" drive problem, then: it's something in common
among the drives.  Has the CE tried alternating to the other SCSI port on the
back of the drives to see if any difference?  All the drives (still) have
unique SCSI IDs, yes?  (Verify at drive panel.)

>We have tried replacing the 18m SCSI cables, re-routing the SCSI cables,
>replacing both SCSI adapters, both terminators, and both 3590 card packs,
>all with no change in results.

It sounds like you're having troubles in the SCSI subsystem, but...
Are you able to have the robot perform tape mounts, either through ADSM or the
mtlib command, then then have the drive eject the tape and the robot put it
away?  If mounts/dismounts are not working, I'd suspect the ARTIC card in the
industrial PC.  Otherwise, any problems at all with getting proper responses
to various queries with the 'mtlib' command, which is going through the lmcpd?
(The lmcpd code *might* have been corrupted.)
Have you used 'tapeutil' to exercise one of the problem drives to see what may
incite the error.  Basically, you want to prod the experimental subject in
various ways to see what variable evokes symptoms.

>IBM hardware support says it's a software problem, ADSM support says it's
>a hardware problem... Anyone have any ideas?  I'm sweating big time
>because I'll run out of disk space upon which to do my backups soon
>if I can't move the data off to tape! :-(

Has the system been rebooted (to reconfigure the drivers and reload the
lmcpd)?  Does a 'cfgmgr' command report any errors (indicating a drivers
problem)?  'lscfg -v' shows everything as it should be, and SMIT shows the
drives properly ADSM-configured?  'lsattr -EHl <Dev_Name>' shows the drives
reasonably?

   Richard Sims, BU
<Prev in Thread] Current Thread [Next in Thread>