Networker

Re: [Networker] read open error with Mammoth2 drives

2003-10-09 08:26:28
Subject: Re: [Networker] read open error with Mammoth2 drives
From: Ingo Roschmann <ingo AT VISIONET DOT DE>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 9 Oct 2003 08:25:05 -0400
Jack,

we did not make changes to anything; in fact, the problems came over night
and persisted even after we replaced the whole storage node with a
different machine, including cabling.
- st.conf is set to serve M2 drives as proposed by Exabyte;
- Exabyte MammothTool does not find any problems;
- tar, dd, ufsdump are working fine;
- we applied the latest SUN patch for st driver (108725-14);

The last errors occurred when trying to load and mount a tape:

---snip---
holloway{root}[/]: nsrjb -s daintree -l -vvvv -S 12 -f /dev/rmt/1cbn
setting verbosity level to `4'
any_shared_jbs name: EXB220-EG
any_shared_jbs name: rd=holloway:EXB220-OG4
any_shared_jbs returning: 0
using 'rd=holloway:/dev/rmt/1cbn' as device name
enabling `volume_tags' feature.
enabling `elements_status' feature.
enabling `barcode' feature.
enabling `elements_status' feature.
setting option `eject sleep' to `5'.
setting option `unload sleep' to `5'.
setting option `load sleep' to `5'.
setting option `cleaning delay' to `60'.
setting option `deposit timeout' to `600'.
any_shared_jbs name: EXB220-EG
any_shared_jbs name: rd=holloway:EXB220-OG4
any_shared_jbs returning: 0
enabling `volume_tags' feature.
enabling `elements_status' feature.
enabling `barcode' feature.
enabling `elements_status' feature.
setting option `eject sleep' to `5'.
setting option `unload sleep' to `5'.
setting option `load sleep' to `5'.
setting option `cleaning delay' to `60'.
setting option `deposit timeout' to `600'.
box_open: port [email protected]
box_inventory:
box_close: port [email protected]
box_inventory_free:
box_open: port [email protected]
box_load: slot `12' into drive `rd=holloway:/dev/rmt/1cbn'
box_display: `CL000028' for drive `rd=holloway:/dev/rmt/1cbn'
box_close: port [email protected]
load sleep for 5 seconds
nsrjb: About to read label on rd=holloway:/dev/rmt/1cbn
nsrjb: About to eject volume from rd=holloway:/dev/rmt/1cbn
execute_table: slot number 12, rewind I/O error
box_label_complete():
eject sleep for 5 seconds
box_open: port [email protected]
box_unload: drive `rd=holloway:/dev/rmt/1cbn' into slot `12'
box_close: port [email protected]
unload sleep for 5 seconds
Error: while operating on slot `12': rewind I/O error
execute_table:  Thu 13:45:02 rewind I/O error

nsrjb: Jukebox error,  Thu 13:45:02 rewind I/O error
--snip---

/var/adm/messages shows the following:

---snip---
Oct  9 11:06:16 holloway scsi: [ID 107833 kern.warning]
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000
(esp0):
Oct  9 11:06:16 holloway        Disconnected command timeout for Target 5.0
Oct  9 11:06:16 holloway scsi: [ID 107833 kern.warning]
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/st
@5,0 (st5):
Oct  9 11:06:16 holloway        SCSI transport failed: reason 'timeout':
giving up
Oct  9 11:06:16 holloway
Oct  9 11:08:46 holloway scsi: [ID 107833 kern.warning]
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000
(esp0):
Oct  9 11:08:46 holloway        Disconnected command timeout for Target 5.0
Oct  9 11:08:46 holloway scsi: [ID 107833 kern.warning]
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/st
@5,0 (st5):
Oct  9 11:08:46 holloway        SCSI transport failed: reason 'timeout':
giving up
Oct  9 11:08:46 holloway
Oct  9 11:08:49 holloway scsi: [ID 107833 kern.warning]
WARNING: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/st
@5,0 (st5):
Oct  9 11:08:49 holloway        Error for Command: rezero/rewind
Error Level: Fatal
Oct  9 11:08:49 holloway scsi: [ID 107833 kern.notice]  Requested Block:
0                         Error Block: 0
Oct  9 11:08:49 holloway scsi: [ID 107833 kern.notice]  Vendor:
EXABYTE                            Serial Number: 8E002106
Oct  9 11:08:49 holloway scsi: [ID 107833 kern.notice]  Sense Key: Unit
Attention
Oct  9 11:08:49 holloway scsi: [ID 107833 kern.notice]  ASC: 0x29 (power
on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
---snip---

After that, I relabeled the tape, without problems, and at the moment I can
mount it. But I am sure it will fail again.

And, yes, we also have had problems with our Mammoth2 drives and had to
replace them in the past, not only in this library but on other sites as
well.


On Fri, 3 Oct 2003 08:26:37 -0400, Jack Lyons <jack.lyons AT MARTINAGENCY DOT 
COM>
wrote:

>I am running Networker 6.1.3 on E240 (Solaris 7) with a Exabyte X200 with 6
>Mammoth2 drives, so I think we have a similar environment.
>
>Have you made any changes to the configuration of the jukebox.  I believe
>with Network versions < 6.x.x the drives were defined as 8mm 20GB?
>
>Also, have you recently patched the server with sun patches, if so, check
to
>make sure the st.conf file has the necessary line for the Mammoth2 drives.
>
>Just my $.02.
>
>P.S. We have had huge problems with the Mammoth 2 Drives.  Each of our 6
>drives has been replaced at least twice.  What has your experience been?
>
>Jack
>
>> -----Original Message-----
>> From: Ingo Roschmann [mailto:ingo AT VISIONET DOT DE]
>> Sent: Thursday, October 02, 2003 5:10 AM
>> To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
>> Subject: [Networker] read open error with Mammoth2 drives
>>
>> Hi All,
>>
>> we really need some good ideas:
>> We have Networker Server 5.3.3 Build 267 on Solaris 8, Tape Library is
>> Exabyte 220 with 2 Mammoth2-drives on a storage node.
>>
>> We are getting errors when trying to mount a tape which most of the time
>> results in Networker hanging in the state of mounting. Exabyte
Mammothtool
>> writes and reads the tapes with no error on both drives.
>> The errors seem to be related to the scsi bus as I can see from messages-
>> file:
>>
>> scsi: [ID 107833 kern.warning]
>> WARNING:
>> /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/st
>> @5,0 (st5):
>>        Error for Command: read                    Error Level: Fatal
>> scsi: [ID 107833 kern.notice]  Requested Block: 0
>> Error Block: 0
>> scsi: [ID 107833 kern.notice]  Vendor: EXABYTE
>> Serial Number: 8E002106
>> scsi: [ID 107833 kern.notice]  Sense Key: No Additional Sense
>> scsi: [ID 107833 kern.notice]  ASC: 0x0 (no additional sense info), ASCQ:
>> 0x0, FRU: 0x0
>> scsi: [ID 107833 kern.notice]  Incorrect Length Indicator Set
>>
>> We also have:
>>
>> scsi: [ID 365881 kern.info] fas:       3.0: cdb=[ 0x8 0x2 0x0 0x2 0x0 0x0
>> ]
>> scsi: [ID 107833 kern.warning] WARNING: /sbus@1f,0/SUNW,fas@e,8800000
>> (fas0):
>>        Disconnected command timeout for Target 3.0
>>
>>
>> We suspected there is some problem with the tape library, but Exabyte
>> support tested the library and drives and says ok; then we suspected the
>> storage node; we replaced the storage node and the problems persist; we
>> suspected the tapes being damaged and replaced them all - nothing.
>>
>> Does anyone have an idea if Networker itsself could be responsible for
>> such
>> kind of problems and how to test that?
>> Any ideas very appreciated!
>>
>> Thanks,
>> Ingo
>>
>> --
>> Note: To sign off this list, send a "signoff networker" command via email
>> to listserv AT listmail.temple DOT edu or visit the list's Web site at
>> http://listmail.temple.edu/archives/networker.html where you can
>> also view and post messages to the list.
>> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>
>
>This email and its contents may be confidential.  If it is and you are not
>the intended recipient, please do not disclose or use the information
within
>this email or its attachments.  If you have received this email in error,
>please delete it immediately.  Thank you.
>
>--
>Note: To sign off this list, send a "signoff networker" command via email
>to listserv AT listmail.temple DOT edu or visit the list's Web site at
>http://listmail.temple.edu/archives/networker.html where you can
>also view and post messages to the list.
>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>