This is a multi-part message in MIME format.
------=_NextPart_000_0190_01C3F0C3.BB6BD0C0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MessageHave the cables connecting the devices been replaced?=20
I have similar problem recently with LTO drives and tried many things. =
My environment was Solaris and there were some patches to apply ( =
although that doesn't help sorry ), but the cables were mentioned plus I =
found LTO media has a chip inside it which can be dislodged. If you =
shake the tapes and they rattle loudly then most likely they are =
damaged. This could be more than one tape if they have come from the =
same batch perhaps.
Just some ideas
Dave
----- Original Message -----=20
From: Sokolowski Ric-ERS004=20
To: 'veritas-bu AT mailman.eng.auburn DOT edu'=20
Sent: Wednesday, February 11, 2004 4:28 PM
Subject: [Veritas-bu] HELP - media and I/O errors
Our system:
NB 4.5 MP5
master - HP-UX 11.00
media - 4 HP-UX 11.00, 1 HP-UX 11.11
STK L700 (HP20/700) w/10 HP LTO 1 drives w/SSO
5 HP 2/1 FC/SCSI bridges
1 Brocade 2800
We're seeing tons of media-related errors (70% status 86 - media =
position, 30% status 84 - media write) spread across
all drives. Some nights we see no errors, other nights we'll see =
50-100 media-related failures. We see the failures when
reusing tapes and with brand new tapes. All drives have been cleaned =
recently. We have had cases open w/Veritas and
HP for just over 4 weeks now. Veritas has examined over a months =
worth of log files and has determined that the
problem is hardware related. HP replaced 3 drives, we saw media =
failures on these 3 new drives the same day they were
replaced. HP also replaced the robot controller, the camera, and one =
of the Fibre bridges. We're not seeing any
communication errors on the FC switch. Everything has the latest =
available firmware. Whenever we get the status 84/86,
we see a lot of things like "cannot read from media socket 10", =
"ioctl (MTREW) failed on media id 402280, drive index 4,
I/O error (bptm.c.7197)" and "write error on media id 402280, drive =
index 4, writing header block, I/O error". Normally,
between 2 and 5 drives are downed every night - always with a tape =
stuck in the the drive. Occasionally the system will
freeze dozens of tapes because they're seen as "unmountable" which =
leads to a boatload of status 96 (no media)
failures. Our backup success rate has dropped from over 98% to below =
80% - management is freaking out. We're
grasping at straws here folks, any help would be GREATLY appreciated!
--=20
Regards,=20
Ric Sokolowski (Ric.Sokolowski AT motorola DOT com)=20
Staff Systems Engineer=20
Phone: (954) 723-6332=20
Pager: 9545530742 AT messaging.nextel DOT com=20
Motorola, Inc. / CGISS / Enterprise Computing=20
8000 West Sunrise Blvd, MS 22-2F, Plantation, FL 33322=20
------=_NextPart_000_0190_01C3F0C3.BB6BD0C0
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1400" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Have the cables connecting the devices =
been=20
replaced? </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>I have similar problem recently with =
LTO drives and=20
tried many things. My environment was Solaris and there were some =
patches to=20
apply ( although that doesn't help sorry ), but the cables were =
mentioned plus I=20
found LTO media has a chip inside it which can be dislodged. If you =
shake the=20
tapes and they rattle loudly then most likely they are damaged. This =
could be=20
more than one tape if they have come from the same batch =
perhaps.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Just some ideas</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
<DIV=20
style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
<A title=3DRic.Sokolowski AT motorola DOT com=20
href=3D"mailto:Ric.Sokolowski AT motorola DOT com">Sokolowski Ric-ERS004</A> =
</DIV>
<DIV style=3D"FONT: 10pt arial"><B>To:</B> <A=20
title=3Dveritas-bu AT mailman.eng.auburn DOT edu=20
=
href=3D"mailto:'veritas-bu AT mailman.eng.auburn DOT
edu'">'[email protected]=
ng.auburn.edu'</A>=20
</DIV>
<DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Wednesday, February 11, =
2004 4:28=20
PM</DIV>
<DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Veritas-bu] HELP - =
media and=20
I/O errors</DIV>
<DIV><BR></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>Our=20
system:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D070274215-11022004></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>NB =
4.5=20
MP5</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>master - HP-UX=20
11.00</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>media - 4 HP-UX=20
11.00, 1 HP-UX 11.11</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>STK =
L700=20
(HP20/700) w/10 HP LTO 1 drives w/SSO</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>5 HP =
2/1 FC/SCSI=20
bridges</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>1 =
Brocade=20
2800</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D070274215-11022004></SPAN></FONT> </DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
size=3D2>We're seeing tons of media-related errors (70% status 86 - =
media=20
position, 30% status 84 - media write) spread=20
across</FONT></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial size=3D2>all=20
drives. Some nights we </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>see no errors, =
other nights=20
we'll see 50-100 media-related failures. We see the failures=20
when</FONT></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
size=3D2>reusing tapes and with </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>brand new =
tapes. All=20
drives have been cleaned recently. <SPAN =
class=3D070274215-11022004>We=20
have had cases open w/Veritas and</SPAN></FONT></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
size=3D2><SPAN class=3D070274215-11022004>HP for just over 4=20
</SPAN></FONT></SPAN></FONT><FONT size=3D+0><SPAN =
class=3D070274215-11022004><FONT=20
face=3DArial size=3D2><SPAN class=3D070274215-11022004>weeks =
now. Veritas=20
has examined over a months worth of log files and has determined that=20
the</SPAN></FONT></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
size=3D2><SPAN class=3D070274215-11022004>problem is hardware=20
</SPAN></FONT></SPAN></FONT><FONT size=3D+0><SPAN =
class=3D070274215-11022004><FONT=20
face=3DArial size=3D2><SPAN class=3D070274215-11022004>related. =
</SPAN>HP=20
replaced 3 drives, we saw media failures on these 3 new drives the =
same day=20
they were</FONT></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
size=3D2>replaced. HP </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>also replaced =
the robot=20
controller, the camera, and one of the Fibre bridges. We're not =
seeing=20
any</FONT></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
size=3D2>communication </FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>errors on the =
FC=20
switch. Everything has the latest available=20
firmware. Whenever we get the status=20
84/86,</FONT></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial size=3D2>we=20
see a </FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>lot of things =
like=20
"</FONT><FONT face=3DArial size=3D2>cannot read from media socket =
10</FONT><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>", =
"</FONT><FONT face=3DArial=20
size=3D2>ioctl (MTREW) failed on media id 402280, </FONT><FONT =
face=3DArial=20
size=3D2>drive index 4,</FONT></SPAN></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>I/O=20
</FONT></SPAN></SPAN></FONT><FONT size=3D+0><SPAN =
class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><FONT face=3DArial size=3D2>error=20
(bptm.c.7197)</FONT><SPAN class=3D070274215-11022004><FONT =
face=3DArial size=3D2>"=20
and "</FONT><FONT face=3DArial size=3D2>write error on media id =
402280, drive=20
</FONT><FONT face=3DArial><FONT size=3D2>index 4, writing header =
block, I/O=20
error<SPAN class=3D070274215-11022004>". =20
Normally,</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
size=3D2><SPAN=20
=
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
class=3D070274215-11022004>between 2 and 5 drives are downed =
every night -=20
always with a tape stuck in the the drive. Occasionally the =
system=20
</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT><FONT size=3D+0><SPAN =
class=3D070274215-11022004><SPAN class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
=
class=3D070274215-11022004>will</SPAN></FONT></FONT></SPAN></SPAN></SPAN>=
</FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
size=3D2><SPAN class=3D070274215-11022004>freeze dozens of tapes =
because they're=20
seen as "unmountable" which leads to a boatload of status 96 (no=20
media)</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
size=3D2><SPAN=20
=
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
class=3D070274215-11022004>failures. Our backup success rate has =
dropped=20
from over 98% to below 80% - management is freaking out. =20
We're</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
<DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
size=3D2><SPAN=20
=
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
class=3D070274215-11022004>grasping at straws here folks, any help =
would be=20
GREATLY =
appreciated!</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
<DIV><FONT size=3D1><FONT face=3D"Comic Sans =
MS"></FONT></FONT> </DIV>
<DIV><FONT size=3D1><FONT face=3D"Comic Sans MS">-- <BR>Regards,=20
</FONT></FONT></DIV>
<P><FONT size=3D1><FONT face=3D"Comic Sans MS">Ric Sokolowski=20
(Ric.Sokolowski AT motorola DOT com) <BR>Staff Systems Engineer <BR>Phone: =
(954)=20
723-6332 <BR>Pager: 9545530742 AT messaging.nextel DOT com <BR>Motorola, =
Inc. /=20
CGISS / Enterprise Computing <BR>8000 West Sunrise Blvd, MS 22-2F, =
Plantation,=20
FL 33322 </FONT></FONT></P><BR>
<DIV> </DIV></BLOCKQUOTE></BODY></HTML>
------=_NextPart_000_0190_01C3F0C3.BB6BD0C0--
|