Veritas-bu

[Veritas-bu] DLT drives going down

2006-01-04 09:55:33
Subject: [Veritas-bu] DLT drives going down
From: layne.barber.ctr AT csd.disa DOT mil (Barber, Layne (Contractor))
Date: Wed, 4 Jan 2006 08:55:33 -0600
This is a multi-part message in MIME format.

------_=_NextPart_001_01C6113E.E99BC6E4
Content-Type: text/plain;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Everything has been power cycled several times. The drives appear to go
down at the end of a job.

________________________________

From: WEAVER, Simon [mailto:simon.weaver AT astrium.eads DOT net]=20
Sent: Wednesday, January 04, 2006 08:47
To: Barber, Layne (Contractor); veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] DLT drives going down


I assume its been physically powered down / restarted the drives /
robot?
Also, do the tapes go into a DOWN STATE during the middle of a backup,
or beginning or would you say very random?
=20
=20

Simon Weaver
Technical Support
Windows Domain Administrator=20

EADS Astrium
Tel: 02392-708598=20

Email: Simon.Weaver AT Astrium.eads DOT net=20

        -----Original Message-----
        From: Barber, Layne (Contractor)
[mailto:layne.barber.ctr AT csd.disa DOT mil]=20
        Sent: 04 January 2006 14:34
        To: WEAVER, Simon; veritas-bu AT mailman.eng.auburn DOT edu
        Subject: RE: [Veritas-bu] DLT drives going down
=09
=09
        They have upgraded the FW to the latest/greatest and checked
cables. I agree on the polling.

________________________________

        From: WEAVER, Simon [mailto:simon.weaver AT astrium.eads DOT net]=20
        Sent: Wednesday, January 04, 2006 08:21
        To: Barber, Layne (Contractor);
veritas-bu AT mailman.eng.auburn DOT edu
        Subject: RE: [Veritas-bu] DLT drives going down
=09
=09
        Hmmmmm what about Firmware for the SDLT tape drives?
        Any loose cables / connections or possible to change Scsi
connectors?
        =20
        Not too sure why they feel the software is polling a drive could
cause a problem? - if anything I would say polling a device is probably
good as Netbackup confirms it can see it!
        =20
        The thing that makes me wonder if its cable / firmware issue is
the comment "MEDIUM NOT PRESENT".
        Thanks

        Simon Weaver
        Technical Support
        Windows Domain Administrator=20

        EADS Astrium
        Tel: 02392-708598=20

        Email: Simon.Weaver AT Astrium.eads DOT net=20

                -----Original Message-----
                From: Barber, Layne (Contractor)
[mailto:layne.barber.ctr AT csd.disa DOT mil]=20
                Sent: 04 January 2006 14:10
                To: veritas-bu AT mailman.eng.auburn DOT edu
                Subject: [Veritas-bu] DLT drives going down
        =09
        =09
                We have an issue of drives randomly going down every
night. NBU 5.0 mp5 HP-UX 11.11 STK L180 w/ STK 3400 scsi bridge.
                =20
                For some reason, 1 or more drives go down at random
every night when backups run. Different tapes and different drives.
Backups will be running fine and then drives begin to go down. These are
SDLT320 drives. once they go down, you can't use robtest to move the
tapes (medium not present error) or use the robtest unload command
(device not present).
                =20
                If we power cycle the scsi bridge, we can talk to the
drives and do what ever we want. STK is claiming that there is something
coming from the host that is "polling" the library from the physical
layer (assume HBA). We have had the SA for the master/media server
disable any polling and load the latest patches from HP to no avail. We
have changed from auto index to a manual map index as well.
                =20
                This was working from the end of June up until the
second week in October.
                =20
                Thoughts/suggestions?
                =20
                Log snippets from last night:
                =20
        =09

                syslog entries
                Jan  4 05:37:42 ujachr01 vmunix: SCSI TAPE: dev =3D
0xcd0801c0 I/O error during close
                Jan  4 05:50:10 ujachr01 vmunix: SCSI TAPE: dev =3D
0xcd0801c0 I/O error during close
                Jan  4 11:27:52 ujachr01 vmunix: SCSI TAPE: dev =3D
0xcd0800c0 I/O error during close
                Jan  4 11:34:36 ujachr01 tldcd[18968]: TLD(1) key =3D 0x5,
asc =3D 0x3a, ascq =3D 0x0, MEDIUM NOT PRESENT
                Jan  4 11:34:36 ujachr01 tldcd[18968]: TLD(1)
Move_medium error
                Jan  4 11:34:36 ujachr01 tldd[4233]: TLD(1) drive 5
(device 4) is being DOWNED, status: Robotic dismount failure
                Jan  4 11:34:36 ujachr01 tldd[4233]: Check integrity of
the drive, drive path, and media
                =20
                drive 5 (addr 504) access =3D 0 Contains Cartridge =3D yes
                Source address =3D 1119 (slot 120)
                Barcode =3D JA1156
                =20

                Jan  4 11:55:12 ujachr01 tldcd[19684]: TLD(1) key =3D 0x5,
asc =3D 0x3a, ascq =3D 0x0, MEDIUM NOT PRESENT
                Jan  4 11:55:12 ujachr01 tldcd[19684]: TLD(1)
Move_medium error
                Jan  4 11:55:12 ujachr01 tldd[4233]: TLD(1) drive 1
(device 0) is being DOWNED, status: Robotic dismount failure
                Jan  4 11:55:12 ujachr01 tldd[4233]: Check integrity of
the drive, drive path, and media
                =20
                drive 1 (addr 500) access =3D 0 Contains Cartridge =3D yes
                Source address =3D 1106 (slot 107)
                Barcode =3D JA1064
        =09

This email is for the intended addressee only.
If you have received it in error then you must not use, retain,
disseminate or otherwise deal with it.
Please notify the sender by return email.
The views of the author may not necessarily constitute the views of EADS
Astrium Limited.
Nothing in this email shall bind EADS Astrium Limited in any contract or
obligation.

EADS Astrium Limited, Registered in England and Wales No. 2449259
Registered Office: Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS,
England
=09

This email is for the intended addressee only.
If you have received it in error then you must not use, retain,
disseminate or otherwise deal with it.
Please notify the sender by return email.
The views of the author may not necessarily constitute the views of EADS
Astrium Limited.
Nothing in this email shall bind EADS Astrium Limited in any contract or
obligation.

EADS Astrium Limited, Registered in England and Wales No. 2449259
Registered Office: Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS,
England
=09

------_=_NextPart_001_01C6113E.E99BC6E4
Content-Type: text/html;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2802" name=3DGENERATOR></HEAD>
<BODY>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D847415414-04012006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Everything has been power cycled several times. =
The drives=20
appear to go down at the end of a job.</FONT></SPAN></DIV><BR>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> WEAVER, Simon=20
[mailto:simon.weaver AT astrium.eads DOT net] <BR><B>Sent:</B> Wednesday, =
January 04,=20
2006 08:47<BR><B>To:</B> Barber, Layne (Contractor);=20
veritas-bu AT mailman.eng.auburn DOT edu<BR><B>Subject:</B> RE: [Veritas-bu] =
DLT drives=20
going down<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=3D209514514-04012006><FONT face=3DArial color=3D#0000ff =
size=3D2>I=20
assume its been physically powered down / restarted the drives /=20
robot?</FONT></SPAN></DIV>
<DIV><SPAN class=3D209514514-04012006><FONT face=3DArial color=3D#0000ff =
size=3D2>Also,=20
do the tapes go into a DOWN STATE during the middle of a backup, or =
beginning or=20
would you say very random?</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV>&nbsp;</DIV><!-- Converted from text/rtf format -->
<P><SPAN lang=3Den-gb><B><FONT face=3DArial color=3D#0000ff =
size=3D2>Simon=20
Weaver</FONT></B><FONT face=3DArial><BR></FONT><B><FONT face=3DArial =
color=3D#0000ff=20
size=3D2>Technical Support</FONT></B><FONT =
face=3DArial><BR></FONT><B><FONT=20
face=3DArial color=3D#0000ff size=3D2>Windows Domain =
Administrator</FONT></B><FONT=20
face=3DArial> </FONT></SPAN></P>
<P><SPAN lang=3Den-gb><B><I><FONT face=3DArial size=3D2>EADS=20
Astrium</FONT></I></B><I></I><FONT =
face=3DArial><BR></FONT><B></B><B><I><FONT=20
face=3DArial size=3D2>Tel: 02392-70</FONT><FONT face=3DArial=20
size=3D2>8598</FONT></I></B><I></I><FONT face=3DArial> =
</FONT></SPAN></P>
<P><SPAN lang=3Den-gb><B><FONT face=3DArial color=3D#ff0000 =
size=3D2>Email:=20
Simon.Weaver AT Astrium.eads DOT net</FONT></B><FONT face=3DArial> =
</FONT></SPAN></P>
<BLOCKQUOTE style=3D"MARGIN-RIGHT: 0px">
  <DIV></DIV>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft><FONT=20
  face=3DTahoma size=3D2>-----Original Message-----<BR><B>From:</B> =
Barber, Layne=20
  (Contractor) [mailto:layne.barber.ctr AT csd.disa DOT mil] <BR><B>Sent:</B> =
04=20
  January 2006 14:34<BR><B>To:</B> WEAVER, Simon;=20
  veritas-bu AT mailman.eng.auburn DOT edu<BR><B>Subject:</B> RE: [Veritas-bu] =
DLT=20
  drives going down<BR><BR></FONT></DIV>
  <DIV dir=3Dltr align=3Dleft><FONT face=3DArial color=3D#0000ff =
size=3D2><SPAN=20
  class=3D925513214-04012006>They have upgraded the FW to the =
latest/greatest and=20
  checked cables. I agree on the polling.</SPAN></FONT></DIV><BR>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
  <HR tabIndex=3D-1>
  <FONT face=3DTahoma size=3D2><B>From:</B> WEAVER, Simon=20
  [mailto:simon.weaver AT astrium.eads DOT net] <BR><B>Sent:</B> Wednesday, =
January 04,=20
  2006 08:21<BR><B>To:</B> Barber, Layne (Contractor);=20
  veritas-bu AT mailman.eng.auburn DOT edu<BR><B>Subject:</B> RE: [Veritas-bu] =
DLT=20
  drives going down<BR></FONT><BR></DIV>
  <DIV></DIV>
  <DIV><SPAN class=3D119461614-04012006><FONT face=3DArial =
color=3D#0000ff=20
  size=3D2>Hmmmmm what about Firmware for the SDLT tape=20
drives?</FONT></SPAN></DIV>
  <DIV><SPAN class=3D119461614-04012006><FONT face=3DArial =
color=3D#0000ff size=3D2>Any=20
  loose cables / connections or possible to change Scsi=20
  connectors?</FONT></SPAN></DIV>
  <DIV><SPAN class=3D119461614-04012006><FONT face=3DArial =
color=3D#0000ff=20
  size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV><SPAN class=3D119461614-04012006><FONT face=3DArial =
color=3D#0000ff size=3D2>Not=20
  too sure why they feel the software is polling a drive could cause a=20
  problem?&nbsp;- if anything I would say polling a device is probably =
good as=20
  Netbackup confirms it can see it!</FONT></SPAN></DIV>
  <DIV><SPAN class=3D119461614-04012006><FONT face=3DArial =
color=3D#0000ff=20
  size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV><SPAN class=3D119461614-04012006><FONT face=3DArial =
color=3D#0000ff size=3D2>The=20
  thing that makes me wonder if its cable / firmware issue is the =
comment=20
  "MEDIUM NOT PRESENT".</FONT></SPAN></DIV>
  <DIV><SPAN class=3D119461614-04012006><FONT face=3DArial =
color=3D#0000ff=20
  size=3D2>Thanks</FONT></SPAN></DIV><!-- Converted from text/rtf format =
-->
  <P><SPAN lang=3Den-gb><B><FONT face=3DArial color=3D#0000ff =
size=3D2>Simon=20
  Weaver</FONT></B><FONT face=3DArial><BR></FONT><B><FONT face=3DArial =
color=3D#0000ff=20
  size=3D2>Technical Support</FONT></B><FONT =
face=3DArial><BR></FONT><B><FONT=20
  face=3DArial color=3D#0000ff size=3D2>Windows Domain =
Administrator</FONT></B><FONT=20
  face=3DArial> </FONT></SPAN></P>
  <P><SPAN lang=3Den-gb><B><I><FONT face=3DArial size=3D2>EADS=20
  Astrium</FONT></I></B><I></I><FONT =
face=3DArial><BR></FONT><B></B><B><I><FONT=20
  face=3DArial size=3D2>Tel: 02392-70</FONT><FONT face=3DArial=20
  size=3D2>8598</FONT></I></B><I></I><FONT face=3DArial> =
</FONT></SPAN></P>
  <P><SPAN lang=3Den-gb><B><FONT face=3DArial color=3D#ff0000 =
size=3D2>Email:=20
  Simon.Weaver AT Astrium.eads DOT net</FONT></B><FONT face=3DArial> =
</FONT></SPAN></P>
  <BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
    <DIV></DIV>
    <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft><FONT=20
    face=3DTahoma size=3D2>-----Original Message-----<BR><B>From:</B> =
Barber, Layne=20
    (Contractor) [mailto:layne.barber.ctr AT csd.disa DOT mil] <BR><B>Sent:</B> 
=
04=20
    January 2006 14:10<BR><B>To:</B>=20
    veritas-bu AT mailman.eng.auburn DOT edu<BR><B>Subject:</B> [Veritas-bu] =
DLT drives=20
    going down<BR><BR></FONT></DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial size=3D2>We =
have an issue=20
    of drives randomly going down every night. NBU 5.0 mp5 HP-UX 11.11 =
STK L180=20
    w/ STK 3400 scsi bridge.</FONT></SPAN></DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial=20
    size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial =
size=3D2>For some reason,=20
    1 or more drives go down at random every night when backups run. =
Different=20
    tapes and different drives. Backups will be running fine and then =
drives=20
    begin to go down. These are SDLT320 drives. once they go down, you =
can't use=20
    robtest to move the tapes (medium not present error) or use the =
robtest=20
    unload command (device not present).</FONT></SPAN></DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial=20
    size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial size=3D2>If =
we power=20
    cycle the scsi bridge, we can talk to the drives and do what ever we =
want.=20
    STK is claiming that there is something coming from the host that is =

    "polling" the library from the physical layer (assume HBA). We have =
had the=20
    SA for the master/media server disable any polling and load the =
latest=20
    patches from HP to no avail. We have changed from auto index to a =
manual map=20
    index as well.</FONT></SPAN></DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial=20
    size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial =
size=3D2>This was working=20
    from the end of June up until the second week in=20
October.</FONT></SPAN></DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial=20
    size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial=20
    size=3D2>Thoughts/suggestions?</FONT></SPAN></DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial=20
    size=3D2></FONT></SPAN>&nbsp;</DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial =
size=3D2>Log snippets=20
    from last night:</FONT></SPAN></DIV>
    <DIV><SPAN class=3D222420014-04012006><FONT face=3DArial=20
    size=3D2></FONT></SPAN>&nbsp;</DIV><SPAN =
class=3D222420014-04012006><FONT=20
    face=3DArial size=3D2>
    <DIV><BR>syslog entries<BR>Jan&nbsp; 4 05:37:42 ujachr01 vmunix: =
SCSI TAPE:=20
    dev =3D 0xcd0801c0 I/O error during close<BR>Jan&nbsp; 4 05:50:10 =
ujachr01=20
    vmunix: SCSI TAPE: dev =3D 0xcd0801c0 I/O error during =
close<BR>Jan&nbsp; 4=20
    11:27:52 ujachr01 vmunix: SCSI TAPE: dev =3D 0xcd0800c0 I/O error =
during=20
    close<BR>Jan&nbsp; 4 11:34:36 ujachr01 tldcd[18968]: TLD(1) key =3D =
0x5, asc =3D=20
    0x3a, ascq =3D 0x0, MEDIUM NOT PRESENT<BR>Jan&nbsp; 4 11:34:36 =
ujachr01=20
    tldcd[18968]: TLD(1) Move_medium error<BR>Jan&nbsp; 4 11:34:36 =
ujachr01=20
    tldd[4233]: TLD(1) drive 5 (device 4) is being DOWNED, status: =
Robotic=20
    dismount failure<BR>Jan&nbsp; 4 11:34:36 ujachr01 tldd[4233]: Check=20
    integrity of the drive, drive path, and media</DIV>
    <DIV>&nbsp;</DIV>
    <DIV>drive 5 (addr 504) access =3D 0 Contains Cartridge =3D =
yes<BR>Source=20
    address =3D 1119 (slot 120)<BR>Barcode =3D JA1156</DIV>
    <DIV>&nbsp;</DIV>
    <DIV><BR>Jan&nbsp; 4 11:55:12 ujachr01 tldcd[19684]: TLD(1) key =3D =
0x5, asc =3D=20
    0x3a, ascq =3D 0x0, MEDIUM NOT PRESENT<BR>Jan&nbsp; 4 11:55:12 =
ujachr01=20
    tldcd[19684]: TLD(1) Move_medium error<BR>Jan&nbsp; 4 11:55:12 =
ujachr01=20
    tldd[4233]: TLD(1) drive 1 (device 0) is being DOWNED, status: =
Robotic=20
    dismount failure<BR>Jan&nbsp; 4 11:55:12 ujachr01 tldd[4233]: Check=20
    integrity of the drive, drive path, and media</DIV>
    <DIV>&nbsp;</DIV>
    <DIV>drive 1 (addr 500) access =3D 0 Contains Cartridge =3D =
yes<BR>Source=20
    address =3D 1106 (slot 107)<BR>Barcode =3D=20
  JA1064<BR></FONT></SPAN></DIV></BLOCKQUOTE>
  <TABLE>
    <TBODY>
    <TR>
      <TD bgColor=3D#ffffff><FONT color=3D#000000>This email is for the =
intended=20
        addressee only.<BR>If you have received it in error then you =
must not=20
        use, retain, disseminate or otherwise deal with it.<BR>Please =
notify the=20
        sender by return email.<BR>The views of the author may not =
necessarily=20
        constitute the views of EADS Astrium Limited.<BR>Nothing in this =
email=20
        shall bind EADS Astrium Limited in any contract or=20
        obligation.<BR><BR>EADS Astrium Limited, Registered in England =
and Wales=20
        No. 2449259<BR>Registered Office: Gunnels Wood Road, Stevenage,=20
        Hertfordshire, SG1 2AS,=20
England<BR></FONT></TD></TR></TBODY></TABLE></BLOCKQUOTE>
<TABLE>
  <TBODY>
  <TR>
    <TD bgColor=3D#ffffff><FONT color=3D#000000>This email is for the =
intended=20
      addressee only.<BR>If you have received it in error then you must =
not use,=20
      retain, disseminate or otherwise deal with it.<BR>Please notify =
the sender=20
      by return email.<BR>The views of the author may not necessarily =
constitute=20
      the views of EADS Astrium Limited.<BR>Nothing in this email shall =
bind=20
      EADS Astrium Limited in any contract or obligation.<BR><BR>EADS =
Astrium=20
      Limited, Registered in England and Wales No. 2449259<BR>Registered =
Office:=20
      Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS,=20
  England<BR></FONT></TD></TR></TBODY></TABLE></BODY></HTML>

------_=_NextPart_001_01C6113E.E99BC6E4--

<Prev in Thread] Current Thread [Next in Thread>