Veritas-bu

[Veritas-bu] HELP - media and I/O errors

2004-02-11 12:23:15
Subject: [Veritas-bu] HELP - media and I/O errors
From: Dave Markham" <dave.markham AT fjserv DOT net (Dave Markham)
Date: Wed, 11 Feb 2004 17:23:15 -0000
This is a multi-part message in MIME format.

------=_NextPart_000_0190_01C3F0C3.BB6BD0C0
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

MessageHave the cables connecting the devices been replaced?=20

I have similar problem recently with LTO drives and tried many things. =
My environment was Solaris and there were some patches to apply ( =
although that doesn't help sorry ), but the cables were mentioned plus I =
found LTO media has a chip inside it which can be dislodged. If you =
shake the tapes and they rattle loudly then most likely they are =
damaged. This could be more than one tape if they have come from the =
same batch perhaps.

Just some ideas
Dave
  ----- Original Message -----=20
  From: Sokolowski Ric-ERS004=20
  To: 'veritas-bu AT mailman.eng.auburn DOT edu'=20
  Sent: Wednesday, February 11, 2004 4:28 PM
  Subject: [Veritas-bu] HELP - media and I/O errors


  Our system:

  NB 4.5 MP5
  master - HP-UX 11.00
  media - 4 HP-UX 11.00, 1 HP-UX 11.11
  STK L700 (HP20/700) w/10 HP LTO 1 drives w/SSO
  5 HP 2/1 FC/SCSI bridges
  1 Brocade 2800

  We're seeing tons of media-related errors (70% status 86 - media =
position, 30% status 84 - media write) spread across
  all drives.  Some nights we see no errors, other nights we'll see =
50-100 media-related failures.  We see the failures when
  reusing tapes and with brand new tapes.  All drives have been cleaned =
recently.  We have had cases open w/Veritas and
  HP for just over 4 weeks now.  Veritas has examined over a months =
worth of log files and has determined that the
  problem is hardware related.  HP replaced 3 drives, we saw media =
failures on these 3 new drives the same day they were
  replaced.  HP also replaced the robot controller, the camera, and one =
of the Fibre bridges.  We're not seeing any
  communication errors on the FC switch.  Everything has the latest =
available firmware.  Whenever we get the status 84/86,
  we see a  lot of things like "cannot read from media socket 10", =
"ioctl (MTREW) failed on media id 402280, drive index 4,
  I/O error (bptm.c.7197)" and "write error on media id 402280, drive =
index 4, writing header block, I/O error".  Normally,
  between 2 and 5 drives are downed every night - always with a tape =
stuck in the the drive.  Occasionally the system will
  freeze dozens of tapes because they're seen as "unmountable" which =
leads to a boatload of status 96 (no media)
  failures.  Our backup success rate has dropped from over 98% to below =
80% - management is freaking out.  We're
  grasping at straws here folks, any help would be GREATLY appreciated!

  --=20
  Regards,=20
  Ric Sokolowski (Ric.Sokolowski AT motorola DOT com)=20
  Staff Systems Engineer=20
  Phone: (954) 723-6332=20
  Pager: 9545530742 AT messaging.nextel DOT com=20
  Motorola, Inc.  / CGISS / Enterprise Computing=20
  8000 West Sunrise Blvd, MS 22-2F, Plantation, FL 33322=20




------=_NextPart_000_0190_01C3F0C3.BB6BD0C0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1400" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Have the cables connecting the devices =
been=20
replaced? </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>I have similar problem recently with =
LTO drives and=20
tried many things. My environment was Solaris and there were some =
patches to=20
apply ( although that doesn't help sorry ), but the cables were =
mentioned plus I=20
found LTO media has a chip inside it which can be dislodged. If you =
shake the=20
tapes and they rattle loudly then most likely they are damaged. This =
could be=20
more than one tape if they have come from the same batch =
perhaps.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Just some ideas</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Dave</FONT></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DRic.Sokolowski AT motorola DOT com=20
  href=3D"mailto:Ric.Sokolowski AT motorola DOT com">Sokolowski Ric-ERS004</A> =
</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A=20
  title=3Dveritas-bu AT mailman.eng.auburn DOT edu=20
  =
href=3D"mailto:'veritas-bu AT mailman.eng.auburn DOT 
edu'">'[email protected]=
ng.auburn.edu'</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Wednesday, February 11, =
2004 4:28=20
  PM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Veritas-bu] HELP - =
media and=20
  I/O errors</DIV>
  <DIV><BR></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>Our=20
  system:</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN=20
  class=3D070274215-11022004></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>NB =
4.5=20
  MP5</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>master - HP-UX=20
  11.00</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D070274215-11022004>media - 4 HP-UX=20
  11.00, 1 HP-UX 11.11</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>STK =
L700=20
  (HP20/700)&nbsp;w/10 HP LTO 1 drives w/SSO</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>5 HP =
2/1 FC/SCSI=20
  bridges</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D070274215-11022004>1 =
Brocade=20
  2800</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN=20
  class=3D070274215-11022004></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
  size=3D2>We're seeing tons of media-related errors (70% status 86 - =
media=20
  position, 30% status 84 - media write) spread=20
across</FONT></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial size=3D2>all=20
  drives.&nbsp; Some nights we </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>see no errors, =
other nights=20
  we'll see 50-100 media-related failures.&nbsp; We see the failures=20
  when</FONT></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
  size=3D2>reusing tapes and with </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>brand new =
tapes.&nbsp; All=20
  drives have been cleaned recently.&nbsp; <SPAN =
class=3D070274215-11022004>We=20
  have had cases open w/Veritas and</SPAN></FONT></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
  size=3D2><SPAN class=3D070274215-11022004>HP for just over 4=20
  </SPAN></FONT></SPAN></FONT><FONT size=3D+0><SPAN =
class=3D070274215-11022004><FONT=20
  face=3DArial size=3D2><SPAN class=3D070274215-11022004>weeks =
now.&nbsp;&nbsp;Veritas=20
  has examined over a months worth of log files and has determined that=20
  the</SPAN></FONT></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
  size=3D2><SPAN class=3D070274215-11022004>problem is hardware=20
  </SPAN></FONT></SPAN></FONT><FONT size=3D+0><SPAN =
class=3D070274215-11022004><FONT=20
  face=3DArial size=3D2><SPAN class=3D070274215-11022004>related.&nbsp; =
</SPAN>HP=20
  replaced 3 drives, we saw media failures on these 3 new drives the =
same day=20
  they were</FONT></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
  size=3D2>replaced.&nbsp;&nbsp;HP </FONT></SPAN></FONT><FONT =
size=3D+0><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>also replaced =
the robot=20
  controller, the camera, and one of the Fibre bridges.&nbsp; We're not =
seeing=20
  any</FONT></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial=20
  size=3D2>communication </FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>errors on the =
FC=20
  switch.&nbsp; Everything has the latest available=20
  firmware.&nbsp;&nbsp;Whenever we get&nbsp;the status=20
  84/86,</FONT></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><FONT =
face=3DArial size=3D2>we=20
  see a&nbsp; </FONT></SPAN></FONT><FONT size=3D+0><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>lot of things =
like=20
  "</FONT><FONT face=3DArial size=3D2>cannot read from media socket =
10</FONT><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>", =
"</FONT><FONT face=3DArial=20
  size=3D2>ioctl (MTREW) failed on media id 402280, </FONT><FONT =
face=3DArial=20
  size=3D2>drive index 4,</FONT></SPAN></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>I/O=20
  </FONT></SPAN></SPAN></FONT><FONT size=3D+0><SPAN =
class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial size=3D2>error=20
  (bptm.c.7197)</FONT><SPAN class=3D070274215-11022004><FONT =
face=3DArial size=3D2>"=20
  and&nbsp;"</FONT><FONT face=3DArial size=3D2>write error on media id =
402280, drive=20
  </FONT><FONT face=3DArial><FONT size=3D2>index 4, writing header =
block, I/O=20
  error<SPAN class=3D070274215-11022004>".&nbsp;=20
  Normally,</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
  size=3D2><SPAN=20
  =
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
  size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
  class=3D070274215-11022004>between 2 and 5 drives are downed =
every&nbsp;night -=20
  always with a tape stuck in the the drive.&nbsp; Occasionally the =
system=20
  </SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT><FONT size=3D+0><SPAN =

  class=3D070274215-11022004><SPAN class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
  =
class=3D070274215-11022004>will</SPAN></FONT></FONT></SPAN></SPAN></SPAN>=
</FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
  size=3D2><SPAN class=3D070274215-11022004>freeze dozens of tapes =
because they're=20
  seen as "unmountable" which leads to a boatload of status 96 (no=20
  media)</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
  size=3D2><SPAN=20
  =
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
  size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
  class=3D070274215-11022004>failures.&nbsp; Our backup success rate has =
dropped=20
  from over 98% to below 80% - management is freaking out.&nbsp;=20
  We're</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
  <DIV><FONT size=3D+0><SPAN class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><SPAN class=3D070274215-11022004><FONT =
face=3DArial><FONT=20
  size=3D2><SPAN=20
  =
class=3D070274215-11022004></SPAN></FONT></FONT></SPAN></SPAN></SPAN></FO=
NT><FONT=20
  size=3D+0><SPAN class=3D070274215-11022004><SPAN =
class=3D070274215-11022004><SPAN=20
  class=3D070274215-11022004><FONT face=3DArial><FONT size=3D2><SPAN=20
  class=3D070274215-11022004>grasping at straws here folks, any help =
would be=20
  GREATLY =
appreciated!</SPAN></FONT></FONT></SPAN></SPAN></SPAN></FONT></DIV>
  <DIV><FONT size=3D1><FONT face=3D"Comic Sans =
MS"></FONT></FONT>&nbsp;</DIV>
  <DIV><FONT size=3D1><FONT face=3D"Comic Sans MS">-- <BR>Regards,=20
  </FONT></FONT></DIV>
  <P><FONT size=3D1><FONT face=3D"Comic Sans MS">Ric Sokolowski=20
  (Ric.Sokolowski AT motorola DOT com) <BR>Staff Systems Engineer <BR>Phone: =
(954)=20
  723-6332 <BR>Pager: 9545530742 AT messaging.nextel DOT com <BR>Motorola, =
Inc.&nbsp; /=20
  CGISS / Enterprise Computing <BR>8000 West Sunrise Blvd, MS 22-2F, =
Plantation,=20
  FL 33322 </FONT></FONT></P><BR>
  <DIV>&nbsp;</DIV></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0190_01C3F0C3.BB6BD0C0--


<Prev in Thread] Current Thread [Next in Thread>