Veritas-bu

[Veritas-bu] HP LTO3 FC drives "locking up" [C1]

2007-03-06 16:09:58
Subject: [Veritas-bu] HP LTO3 FC drives "locking up" [C1]
From: misha.pavlov at sgcib.com (misha.pavlov AT sgcib DOT com)
Date: Tue, 6 Mar 2007 16:09:58 -0500
Folks,

did anyone notice a problem with HP LTO3 drives in SSO configuration, 
running NBU 5.1 on Solaris 8 ?

Once - twice a week I have bptm debug logs reporting in the middle 

22:03:41.746 [19156] <2> send_brm_msg: MEDIA NOT READY
22:03:41.746 [19156] <2> write_data: attempting write error recovery, err 
= 5
22:03:41.746 [19156] <2> tape_error_rec: error recovery to block 1485323 
requested
22:03:41.746 [19156] <2> tape_error_rec: attempting error recovery, delay 
3 minutes before next attempt, tries left = 5
22:06:41.739 [19156] <2> io_ioctl: command (0)MTWEOF 0 from 
(overwrite.c.488) on drive index 43
22:06:41.739 [19156] <2> io_ioctl: MTWEOF failed during error recovery, 
I/O error
22:08:40.745 [19156] <2> tape_error_rec: cannot read position for error 
recovery, scsi_determine_bt ret -1 CDB 0x12 SK 0x0 ASC 0x0 ASCQ 0x0

and immediatly after in /var/adm/messages I see

Mar  5 22:03:41 vepanyup03 scsi: [ID 107833 kern.warning] WARNING: 
/pci at 1d,700000/SUNW,emlxs at 2,1/fp at 0,0/st at w500104f0005ddda2,0 (st297):
Mar  5 22:03:41 vepanyup03  SCSI transport failed: reason 'timeout': 
giving up
Mar  5 22:07:40 vepanyup03 bptm[19156]: [ID 832037 daemon.error] scsi 
command failed, may be timeout, scsi_pkt.us_reason = 6
Mar  5 22:07:47 vepanyup03 fctl: [ID 517869 kern.warning] WARNING: 
3654=>fp(1)::GPN_ID for D_ID=150500 failed
Mar  5 22:07:47 vepanyup03 fctl: [ID 517869 kern.warning] WARNING: 
3655=>fp(1)::N_x Port with D_ID=150500, PWWN=500104f0005ddda2 disappeared 
from fabric
Mar  5 22:08:40 vepanyup03 bptm[19156]: [ID 832037 daemon.error] scsi 
command failed, may be timeout, scsi_pkt.us_reason = 6
Mar  5 22:09:01 vepanyup03 scsi: [ID 243001 kern.info] 
/pci at 1d,700000/SUNW,emlxs at 2,1/fp at 0,0 (fcp1):
Mar  5 22:09:01 vepanyup03  offlining lun=0 (trace=0), target=150500 
(trace=2800004)
Mar  5 22:11:40 vepanyup03 bptm[19156]: [ID 498531 daemon.error] user scsi 
ioctl() failed, may be timeout, errno = 5, I/O error

Drives "lock" up and becomes iresponsive.
Front panel light show no signs of problem with solid green light on.
Pressing eject button does not do anything.
Brocade 4100 switch port does not "see" the drive anymore and shows 
"In_Sync" instead of the "Online".

The only way to bring the drive back is the powercycle.
A minute or two after the powercycling I can eject the tape and see
fctl: [ID 517869 kern.warning] WARNING: 3799=>fp(1)::N_x Port with 
D_ID=150500, PWWN=500104f0005ddda2 reappeared in fabric
in /var/adm/messages

Drives and library are at the latest f/w revision.
SUN and STK are clueless, but still looking for the last week.

--
Misha Pavlov
Soci?t? G?n?rale
desk: (212) 278-6096
cell: (646) 346-9341

This message uses only 100% recycled electrons.

*************************************************************************
This message and any attachments (the "message") are confidential and
intended solely for the addressees.
Any unauthorised use or dissemination is prohibited. 
E-mails are susceptible to alteration.   
Neither SOCIETE GENERALE nor any of its subsidiaries or affiliates 
shall be liable for the message if altered, changed or falsified. 

*************************************************************************