[Veritas-bu] SCSI Errors

We've been toying with these errors all over our environment here for the
last 6 months. I'm running Netbackup 3.2 on Solaris 2.6 Sun E-450's. When I
see these errors usually (99.9% of the time) it's a hardware issue. It's
usually the SCSI host adapter, the cable or something related to the drive.
If you can isolate the problem by ruling out the Host adapter and cable by
keeping a spare SCSI host adapter installed on your server ( if you have the
extra space available ) and keeping a "known good SCSI cable" around it's
relatively easy to find your problem. Once you rule out the problem on the
host side you can usually have the Storage Library Vendor replace their
hardware. 

One thing to note is that just because you rule out the SCSI Host adapter
and SCSI cable doesn't mean that the Library vendor will see errors on their
equipment when they test their end (tape drive or robotic control). I've
tested and cleared the host adapter and cables several times in the past and
the vendor comes out to test his equipment and no problems are found. I just
have them replace the tape drive anyway. This happened recently and
Storagetek didn't detect errors on the drive until they got it into their
workshop for more intensive testing.

The problem with this type of error is that if you call Sun they tend to
blame the tape drive or storage library and the Storage library vendor tend
to blame Sun. By tracing the hardware problem yourself you will save
yourself considerable time and pain. This solution works for my team. Hope
this helps you out.

Thanks

Craig Everett
Intuit INC
San Diego Datacenter

-----Original Message-----
From: Steve White [mailto:stevew AT colltech DOT com]
Sent: Monday, January 22, 2001 12:17 PM
To: veritas-bu AT Eng.Auburn DOT edu
Subject: RE: [Veritas-bu] SCSI Errors


When I've seen this in the past, it was usually the drive (specifically the
power supply).  We did have one batch of bad tapes and they all caused this
type of problem, but that would show up as lots of failures in different
drives.

Steve

-----Original Message-----
From: veritas-bu-admin AT Eng.Auburn DOT edu
[mailto:veritas-bu-admin AT Eng.Auburn DOT edu]On Behalf Of Chandra Kalle
Sent: Monday, January 22, 2001 12:16 PM
To: veritas-bu AT Eng.Auburn DOT edu
Cc: Chandra Kalle
Subject: [Veritas-bu] SCSI Errors



Server- E250/Solaris 7.

Tape Library- ATL P1000 with 4 DLT7000 drives/30 slots.

Backup Management- Veritas NetBackup 3.2

All hardware is 3 months old.

Problem- Getting SCSI transport messages on
TLD. Occurence- first time. There was one
tape being written in drive 1 that was
left in the drive after the TLD went down
and came back up. I had to manually eject it
and I took it out.

Any suggestions? Is it the cable or the drive itself?
or the TLD?

/var/adm/messages-

Jan 21 23:12:06 cheetah unix: /pci@1f,4000/scsi@4,1 (glm3):
Jan 21 23:12:06 cheetah         Cmd (0x2abb08) dump for Target 3 Lun 0:
Jan 21 23:12:06 cheetah unix: /pci@1f,4000/scsi@4,1 (glm3):
Jan 21 23:12:06 cheetah                 cdb=[ 0x11 0x1 0xff 0xff 0xff 0x0
]
Jan 21 23:12:06 cheetah unix: /pci@1f,4000/scsi@4,1 (glm3):
Jan 21 23:12:06 cheetah         pkt_flags=0x0 pkt_statistics=0x61
pkt_state=0x7
Jan 21 23:12:06 cheetah unix: /pci@1f,4000/scsi@4,1 (glm3):
Jan 21 23:12:06 cheetah         pkt_scbp=0x0 cmd_flags=0xe1
Jan 21 23:12:06 cheetah unix: WARNING: /pci@1f,4000/scsi@4,1 (glm3):
Jan 21 23:12:06 cheetah         Disconnected command timeout for Target
3.0
Jan 21 23:12:06 cheetah unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6016]
Jan 21 23:12:06 cheetah unix: WARNING: /pci@1f,4000/scsi@4,1/st@3,0
(st24):
Jan 21 23:12:06 cheetah         SCSI transport failed: reason
'timeout': giving up
Jan 21 23:12:06 cheetah
Jan 21 23:43:03 cheetah tldd[23560]: TLD(1) going to DOWN state,
status: Timeout waiting for robotic command
Jan 21 23:45:08 cheetah tldd[23560]: TLD(1) going to UP state
Jan 21 23:48:45 cheetah tldcd[22471]: TLD(1) key = 0x5, asc = 0x3a, ascq =
0x0, MEDIUM NOT PRESENT
Jan 21 23:48:45 cheetah tldcd[22471]: TLD(1) Move_medium error: CHECK
CONDITION
Jan 21 23:48:46 cheetah tldd[23560]: Adding media ID DBD084 to unmountable
media list
Jan 21 23:48:46 cheetah tldd[23560]: TLD(1) drive 2 (device 1) is being
DOWNED, status: Robotic mount failure
Jan 21 23:48:46 cheetah tldd[23560]: Check integrity of the drive, drive
path, and media
Jan 21 23:48:48 cheetah tldd[23560]: Removing media ID DBD084 from
unmountable media list
Jan 21 23:48:48 cheetah ltid[23556]: Request for EVSN DBD084 is being
rejected because it is in a DOWN drive
Jan 22 09:24:52 cheetah vmd[17631]: terminating - another daemon already
exists (89)

Thanks,
Chandra


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu