Veritas-bu

[Veritas-bu] DLT drives going down

2006-01-04 10:17:08
Subject: [Veritas-bu] DLT drives going down
From: shekhar.dhotre AT lendlease DOT com (Dhotre, Shekhar)
Date: Wed, 4 Jan 2006 10:17:08 -0500
This is a multi-part message in MIME format.


------_=_NextPart_001_01C61141.EDFC4A8F
Content-Type: text/plain;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

The scsi bridge is the culprit here =2E=2E seen it been  before many times=
=0D=0A-replaced scsi bridge with pure fiber LTOs - everything is working fi=
ne=0D=0Aafter that =2E=0D=0A=0D=0A=0D=0A =0D=0A=0D=0A______________________=
__________=0D=0A=0D=0AFrom: veritas-bu-admin@mailman=2Eeng=2Eauburn=2Eedu=
=0D=0A[mailto:veritas-bu-admin@mailman=2Eeng=2Eauburn=2Eedu] On Behalf Of B=
arber,=0D=0ALayne (Contractor)=0D=0ASent: Wednesday, January 04, 2006 9:10 =
AM=0D=0ATo: veritas-bu@mailman=2Eeng=2Eauburn=2Eedu=0D=0ASubject: [Veritas-=
bu] DLT drives going down=0D=0A=0D=0A=0D=0AWe have an issue of drives rando=
mly going down every night=2E NBU 5=2E0 mp5=0D=0AHP-UX 11=2E11 STK L180 w/ =
STK 3400 scsi bridge=2E=0D=0A =0D=0AFor some reason, 1 or more drives go do=
wn at random every night when=0D=0Abackups run=2E Different tapes and diffe=
rent drives=2E Backups will be=0D=0Arunning fine and then drives begin to g=
o down=2E These are SDLT320 drives=2E=0D=0Aonce they go down, you can't use=
 robtest to move the tapes (medium not=0D=0Apresent error) or use the robte=
st unload command (device not present)=2E=0D=0A =0D=0AIf we power cycle the=
 scsi bridge, we can talk to the drives and do what=0D=0Aever we want=2E ST=
K is claiming that there is something coming from the=0D=0Ahost that is "po=
lling" the library from the physical layer (assume HBA)=2E=0D=0AWe have had=
 the SA for the master/media server disable any polling and=0D=0Aload the l=
atest patches from HP to no avail=2E We have changed from auto=0D=0Aindex t=
o a manual map index as well=2E=0D=0A =0D=0AThis was working from the end o=
f June up until the second week in=0D=0AOctober=2E=0D=0A =0D=0AThoughts/sug=
gestions?=0D=0A =0D=0ALog snippets from last night:=0D=0A =0D=0A=0D=0Asyslo=
g entries=0D=0AJan  4 05:37:42 ujachr01 vmunix: SCSI TAPE: dev =3D 0xcd0801=
c0 I/O error=0D=0Aduring close=0D=0AJan  4 05:50:10 ujachr01 vmunix: SCSI T=
APE: dev =3D 0xcd0801c0 I/O error=0D=0Aduring close=0D=0AJan  4 11:27:52 uj=
achr01 vmunix: SCSI TAPE: dev =3D 0xcd0800c0 I/O error=0D=0Aduring close=0D=
=0AJan  4 11:34:36 ujachr01 tldcd[18968]: TLD(1) key =3D 0x5, asc =3D 0x3a,=
=0D=0Aascq =3D 0x0, MEDIUM NOT PRESENT=0D=0AJan  4 11:34:36 ujachr01 tldcd[=
18968]: TLD(1) Move_medium error=0D=0AJan  4 11:34:36 ujachr01 tldd[4233]: =
TLD(1) drive 5 (device 4) is being=0D=0ADOWNED, status: Robotic dismount fa=
ilure=0D=0AJan  4 11:34:36 ujachr01 tldd[4233]: Check integrity of the driv=
e, drive=0D=0Apath, and media=0D=0A =0D=0Adrive 5 (addr 504) access =3D 0 C=
ontains Cartridge =3D yes=0D=0ASource address =3D 1119 (slot 120)=0D=0ABarc=
ode =3D JA1156=0D=0A =0D=0A=0D=0AJan  4 11:55:12 ujachr01 tldcd[19684]: TLD=
(1) key =3D 0x5, asc =3D 0x3a,=0D=0Aascq =3D 0x0, MEDIUM NOT PRESENT=0D=0AJ=
an  4 11:55:12 ujachr01 tldcd[19684]: TLD(1) Move_medium error=0D=0AJan  4 =
11:55:12 ujachr01 tldd[4233]: TLD(1) drive 1 (device 0) is being=0D=0ADOWNE=
D, status: Robotic dismount failure=0D=0AJan  4 11:55:12 ujachr01 tldd[4233=
]: Check integrity of the drive, drive=0D=0Apath, and media=0D=0A =0D=0Adri=
ve 1 (addr 500) access =3D 0 Contains Cartridge =3D yes=0D=0ASource address=
 =3D 1106 (slot 107)=0D=0ABarcode =3D JA1064=0D=0A=0D=0A=0D=0A=0D=0A=0D=0A-=
----------------------------------------=0D=0A"This email (including any at=
tachments) is confidential=2E  If you are not=0D=0Athe intended recipient y=
ou must not copy, use, disclose, distribute or rely=0D=0Aon the information=
 contained in it=2E  If you have received this email in=0D=0Aerror, please =
notify the sender immediately by reply email and delete the=0D=0Aemail from=
 your system=2E  Confidentiality and legal privilege attached to=0D=0Athis =
communication are not waived or lost by reason of mistaken delivery to=0D=
=0Ayou=2E  Lend Lease does not guarantee that this email or the attachment(=
s)=0D=0Aare unaffected by computer virus, corruption or other defects=2E Le=
nd Lease=0D=0Amay monitor incoming and outgoing emails for compliance with =
its Email=0D=0APolicy=2E  Please note that our servers may not be located i=
n your country=2E"=0D=0A
------_=_NextPart_001_01C61141.EDFC4A8F
Content-Type: text/html;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<HTML>=0D=0A<BODY>=0D=0A<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4=2E0 Trans=
itional//EN">=0D=0A<HTML><HEAD>=0D=0A<META http-equiv=3DContent-Type conten=
t=3D"text/html; charset=3Dus-ascii">=0D=0A<META content=3D"MSHTML 6=2E00=2E=
2900=2E2802" name=3DGENERATOR></HEAD>=0D=0A<BODY>=0D=0A<DIV dir=3Dltr align=
=3Dleft><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =0D=0Aclass=3D204=
541115-04012006>The scsi bridge is the culprit here =2E=2E seen it =0D=0Abe=
en&nbsp; before many times -replaced scsi bridge with pure fiber LTOs - =0D=
=0Aeverything is working fine after that =2E</SPAN></FONT></DIV>=0D=0A<DIV>=
=0D=0A<DIV align=3Dleft>=0D=0A<P class=3DMsoNormal style=3D"MARGIN: 0in 0in=
 0pt"><FONT face=3DArial color=3D#0000ff =0D=0Asize=3D2></FONT><FONT face=
=3DArial color=3D#0000ff size=3D2></FONT><FONT face=3DArial =0D=0Acolor=3D#=
0000ff size=3D2></FONT><BR>&nbsp;</P></DIV></DIV>=0D=0A<DIV class=3DOutlook=
MessageHeader lang=3Den-us dir=3Dltr align=3Dleft>=0D=0A<HR tabIndex=3D-1>=
=0D=0A<FONT face=3DTahoma size=3D2><B>From:</B> veritas-bu-admin@mailman=2E=
eng=2Eauburn=2Eedu =0D=0A[mailto:veritas-bu-admin@mailman=2Eeng=2Eauburn=2E=
edu] <B>On Behalf Of </B>Barber, =0D=0ALayne (Contractor)<BR><B>Sent:</B> W=
ednesday, January 04, 2006 9:10 =0D=0AAM<BR><B>To:</B> veritas-bu@mailman=
=2Eeng=2Eauburn=2Eedu<BR><B>Subject:</B> =0D=0A[Veritas-bu] DLT drives goin=
g down<BR></FONT><BR></DIV>=0D=0A<DIV></DIV>=0D=0A<DIV><SPAN class=3D222420=
014-04012006><FONT face=3DArial size=3D2>We have an issue of =0D=0Adrives r=
andomly going down every night=2E NBU 5=2E0 mp5 HP-UX 11=2E11 STK L180 w/ S=
TK =0D=0A3400 scsi bridge=2E</FONT></SPAN></DIV>=0D=0A<DIV><SPAN class=3D22=
2420014-04012006><FONT face=3DArial =0D=0Asize=3D2></FONT></SPAN>&nbsp;</DI=
V>=0D=0A<DIV><SPAN class=3D222420014-04012006><FONT face=3DArial size=3D2>F=
or some reason, 1 =0D=0Aor more drives go down at random every night when b=
ackups run=2E Different tapes =0D=0Aand different drives=2E Backups will be=
 running fine and then drives begin to go =0D=0Adown=2E These are SDLT320 d=
rives=2E once they go down, you can't use robtest to move =0D=0Athe tapes (=
medium not present error) or use the robtest unload command (device =0D=0An=
ot present)=2E</FONT></SPAN></DIV>=0D=0A<DIV><SPAN class=3D222420014-040120=
06><FONT face=3DArial =0D=0Asize=3D2></FONT></SPAN>&nbsp;</DIV>=0D=0A<DIV><=
SPAN class=3D222420014-04012006><FONT face=3DArial size=3D2>If we power cyc=
le =0D=0Athe scsi bridge, we can talk to the drives and do what ever we wan=
t=2E STK is =0D=0Aclaiming that there is something coming from the host tha=
t is "polling" the =0D=0Alibrary from the physical layer (assume HBA)=2E We=
 have had the SA for the =0D=0Amaster/media server disable any polling and =
load the latest patches from HP to =0D=0Ano avail=2E We have changed from a=
uto index to a manual map index as =0D=0Awell=2E</FONT></SPAN></DIV>=0D=0A<=
DIV><SPAN class=3D222420014-04012006><FONT face=3DArial =0D=0Asize=3D2></FO=
NT></SPAN>&nbsp;</DIV>=0D=0A<DIV><SPAN class=3D222420014-04012006><FONT fac=
e=3DArial size=3D2>This was working =0D=0Afrom the end of June up until the=
 second week in October=2E</FONT></SPAN></DIV>=0D=0A<DIV><SPAN class=3D2224=
20014-04012006><FONT face=3DArial =0D=0Asize=3D2></FONT></SPAN>&nbsp;</DIV>=
=0D=0A<DIV><SPAN class=3D222420014-04012006><FONT face=3DArial =0D=0Asize=
=3D2>Thoughts/suggestions?</FONT></SPAN></DIV>=0D=0A<DIV><SPAN class=3D2224=
20014-04012006><FONT face=3DArial =0D=0Asize=3D2></FONT></SPAN>&nbsp;</DIV>=
=0D=0A<DIV><SPAN class=3D222420014-04012006><FONT face=3DArial size=3D2>Log=
 snippets from =0D=0Alast night:</FONT></SPAN></DIV>=0D=0A<DIV><SPAN class=
=3D222420014-04012006><FONT face=3DArial =0D=0Asize=3D2></FONT></SPAN>&nbsp=
;</DIV><SPAN class=3D222420014-04012006><FONT face=3DArial =0D=0Asize=3D2>=
=0D=0A<DIV><BR>syslog entries<BR>Jan&nbsp; 4 05:37:42 ujachr01 vmunix: SCSI=
 TAPE: dev =0D=0A=3D 0xcd0801c0 I/O error during close<BR>Jan&nbsp; 4 05:50=
:10 ujachr01 vmunix: =0D=0ASCSI TAPE: dev =3D 0xcd0801c0 I/O error during c=
lose<BR>Jan&nbsp; 4 11:27:52 =0D=0Aujachr01 vmunix: SCSI TAPE: dev =3D 0xcd=
0800c0 I/O error during close<BR>Jan&nbsp; =0D=0A4 11:34:36 ujachr01 tldcd[=
18968]: TLD(1) key =3D 0x5, asc =3D 0x3a, ascq =3D 0x0, =0D=0AMEDIUM NOT PR=
ESENT<BR>Jan&nbsp; 4 11:34:36 ujachr01 tldcd[18968]: TLD(1) =0D=0AMove_medi=
um error<BR>Jan&nbsp; 4 11:34:36 ujachr01 tldd[4233]: TLD(1) drive 5 =0D=0A=
(device 4) is being DOWNED, status: Robotic dismount failure<BR>Jan&nbsp; 4=
 =0D=0A11:34:36 ujachr01 tldd[4233]: Check integrity of the drive, drive pa=
th, and =0D=0Amedia</DIV>=0D=0A<DIV>&nbsp;</DIV>=0D=0A<DIV>drive 5 (addr 50=
4) access =3D 0 Contains Cartridge =3D yes<BR>Source address =3D =0D=0A1119=
 (slot 120)<BR>Barcode =3D JA1156</DIV>=0D=0A<DIV>&nbsp;</DIV>=0D=0A<DIV><B=
R>Jan&nbsp; 4 11:55:12 ujachr01 tldcd[19684]: TLD(1) key =3D 0x5, asc =3D =
=0D=0A0x3a, ascq =3D 0x0, MEDIUM NOT PRESENT<BR>Jan&nbsp; 4 11:55:12 ujachr=
01 =0D=0Atldcd[19684]: TLD(1) Move_medium error<BR>Jan&nbsp; 4 11:55:12 uja=
chr01 =0D=0Atldd[4233]: TLD(1) drive 1 (device 0) is being DOWNED, status: =
Robotic dismount =0D=0Afailure<BR>Jan&nbsp; 4 11:55:12 ujachr01 tldd[4233]:=
 Check integrity of the =0D=0Adrive, drive path, and media</DIV>=0D=0A<DIV>=
&nbsp;</DIV>=0D=0A<DIV>drive 1 (addr 500) access =3D 0 Contains Cartridge =
=3D yes<BR>Source address =3D =0D=0A1106 (slot 107)<BR>Barcode =3D JA1064<B=
R></FONT></SPAN></DIV></BODY></HTML>=0D=0A=0D=0A=0D=0A<P><hr size=3D1></P>=
=0D=0A<P><STRONG>=0D=0A"This email (including any attachments) is confident=
ial=2E  If you are not=0D=0Athe intended recipient you must not copy, use, =
disclose, distribute or rely=0D=0Aon the information contained in it=2E  If=
 you have received this email in=0D=0Aerror, please notify the sender immed=
iately by reply email and delete the=0D=0Aemail from your system=2E  Confid=
entiality and legal privilege attached to=0D=0Athis communication are not w=
aived or lost by reason of mistaken delivery to=0D=0Ayou=2E  Lend Lease doe=
s not guarantee that this email or the attachment(s)=0D=0Aare unaffected by=
 computer virus, corruption or other defects=2E Lend Lease=0D=0Amay monitor=
 incoming and outgoing emails for compliance with its Email=0D=0APolicy=2E =
 Please note that our servers may not be located in your country=2E"=0D=0A<=
/STRONG></P>=0D=0A</BODY>=0D=0A</HTML>=0D=0A
------_=_NextPart_001_01C61141.EDFC4A8F--


<Prev in Thread] Current Thread [Next in Thread>