Veritas-bu

[Veritas-bu] Serious problem - DLT7000 tape drives gone "not functional"

2006-11-06 14:37:58
Subject: [Veritas-bu] Serious problem - DLT7000 tape drives gone "not functional"
From: mike.m.jackson at ca.mci.com (Mike Jackson)
Date: Mon, 06 Nov 2006 14:37:58 -0500
Hey all,

It seems like I have control over moving media around in the silo now. 
However, when I want to unload the media via robtest, I have to do it 
twice.  Is this normal?

m s1 d3
Initiating MOVE_MEDIUM from address 1000 to 502
MOVE_MEDIUM complete

unload d3
Opening /dev/rmt/2cbn, please wait...
Error - cannot open /dev/rmt/2cbn (I/O error)

unload d3
Opening /dev/rmt/2cbn, please wait...
Tape successfully SCSI unloaded, ready for SCSI2 unload

m d3 d4
Initiating MOVE_MEDIUM from address 502 to 503
MOVE_MEDIUM complete


Thanks,

   - Mike


Mike Jackson wrote:
> Hey Mike,
> 
> Thanks for the heads up on this.  I actually set the "On Bus" option to 
> ON (default is OFF) when I manually reset the SCSI ID's for the drives 
> even though my robot is on it's own SCSI card:
> 
> /dev/sg/c10t5l0: Changer: "STK     L700"
> /dev/sg/c2t0l0: Tape (/dev/rmt/0): "QUANTUM DLT7000"
> /dev/sg/c2t1l0: Tape (/dev/rmt/1): "QUANTUM DLT7000"
> /dev/sg/c4t2l0: Tape (/dev/rmt/2): "QUANTUM DLT7000"
> /dev/sg/c4t3l0: Tape (/dev/rmt/3): "QUANTUM DLT7000"
> /dev/sg/c6t4l0: Tape (/dev/rmt/4): "QUANTUM DLT7000"
> /dev/sg/c6t5l0: Tape (/dev/rmt/5): "QUANTUM DLT7000"
> /dev/sg/c8t6l0: Tape (/dev/rmt/6): "QUANTUM DLT7000"
> 
> As a test, I just turned off "On Bus" for the drives and did a reset. 
> Now only 2 of the 8 drives are "NOT FUNCTIONAL" (drives 1 and 2) --->
> 
> Opening /dev/sg/c10t5l0
> MODE_SENSE complete
> Enter tld commands (? returns help information)
> s d
> drive 1 (addr 500) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 1 is 0
> drive 2 (addr 501) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 2 is 1
> drive 3 (addr 502) access = 1 Contains Cartridge = no
> SCSI ID from drive 3 is 2
> drive 4 (addr 503) access = 1 Contains Cartridge = no
> SCSI ID from drive 4 is 3
> drive 5 (addr 504) access = 1 Contains Cartridge = no
> SCSI ID from drive 5 is 4
> drive 6 (addr 505) access = 1 Contains Cartridge = no
> SCSI ID from drive 6 is 5
> drive 7 (addr 506) access = 1 Contains Cartridge = no
> SCSI ID from drive 7 is 6
> drive 8 (addr 507) access = 1 Contains Cartridge = no
> SCSI ID from drive 8 is 8
> READ_ELEMENT_STATUS complete
> 
> 
> I'm going to look up the Sense Code 0x40 / Code qualifier 0x2 and see 
> what it's all about, maybe I have some bad drives.
> 
> Thanks again,
> 
>    - Mike
> 
> Mike Dunn (veritas-bu) wrote:
>> Mike,
>>
>> The fact that your L700 configuration was cleared leads me to wonder about
>> one very critical setting in the L700 (or STK TLD libraries in general). 
>> If you are using SCSI to control your robot, AND your robot is physically
>> on the same SCSI bus as another tape drive, make certain that the "On Bus"
>> setting is enabled for your library.  For some reason, having this setting
>> disabled when the robot and drives share a bus causes very random
>> behaivour.  By default, I believe, it is disabled.
>>
>>   Cheers
>>   Mike
>>
>>
>> Message: 1
>> Date: Mon, 06 Nov 2006 11:45:20 -0500
>> From: Mike Jackson <mike.m.jackson at ca.mci.com>
>> Subject: [Veritas-bu] Serious problem - DLT7000 tape drives gone "not
>>      functional"
>> To: veritas-bu at mailman.eng.auburn.edu
>> Message-ID: <454F66A0.3040200 at ca.mci.com>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Hello all,
>>
>> We're running a Solaris w/ NetBackup 5.0 master server environment with 
>> a SCSI attached StorageTek L700 library with eight DLT7000 drives.  We 
>> ran into an "event" the other night which cleared the L700 configuration 
>> which reset all of the drive SCSI ID's to Invalid.  I manually 
>> reconfigured the SCSI ID's 00 through 08 (skipping 07 which we cannot 
>> use).  SGSCAN sees the drives but when I try robtest or run manual 
>> backups the environment goes crazy and DOWN's all the drives.  I've got 
>> a support ticket opened with StorageTek but at this point they're not 
>> sure what the problem could be.  The LCD display on the L700 library 
>> says "NOT FUNCTIONAL" for all the drives even after a reboot.
>>
>> Here's some information from sgscan / tpconfig && robtest:
>>
>> [nb-master-01:ROOT](~): sgscan
>> .
>> /dev/sg/c10t5l0: Changer: "STK     L700"
>> .
>> /dev/sg/c2t0l0: Tape (/dev/rmt/0): "QUANTUM DLT7000"
>> /dev/sg/c2t1l0: Tape (/dev/rmt/1): "QUANTUM DLT7000"
>> /dev/sg/c4t2l0: Tape (/dev/rmt/2): "QUANTUM DLT7000"
>> /dev/sg/c4t3l0: Tape (/dev/rmt/3): "QUANTUM DLT7000"
>> /dev/sg/c6t4l0: Tape (/dev/rmt/4): "QUANTUM DLT7000"
>> /dev/sg/c6t5l0: Tape (/dev/rmt/5): "QUANTUM DLT7000"
>> /dev/sg/c8t6l0: Tape (/dev/rmt/6): "QUANTUM DLT7000"
>> [nb-master-01:ROOT](~):
>>
>> [nb-master-01:ROOT](~): tpconfig -l
>> Device Robot Drive       Robot                    Drive 
>> Device         Second
>> Type     Num Index  Type DrNum Status  Comment    Name             Path 
>>            Device Path
>> robot      0    -    TLD    -       -  -          - 
>> /dev/sg/c10t5l0
>>    drive    -    0    dlt    3      UP  -          QUANTUMDLT70003 
>> /dev/rmt/2cbn
>>    drive    -    1    dlt    4      UP  -          QUANTUMDLT70004 
>> /dev/rmt/3cbn
>>    drive    -    2    dlt    5      UP  -          QUANTUMDLT70005 
>> /dev/rmt/4cbn
>>    drive    -    3    dlt    6      UP  -          QUANTUMDLT70006 
>> /dev/rmt/5cbn
>>    drive    -    4    dlt    7      UP  -          QUANTUMDLT70007 
>> /dev/rmt/6cbn
>>    drive    -    5    dlt    1      UP  -          QUANTUMDLT70001 
>> /dev/rmt/0cbn
>>    drive    -    7    dlt    2      UP  -          QUANTUMDLT70002 
>> /dev/rmt/1cbn
>>
>> Robot selected: TLD(0)   robotic path = /dev/sg/c10t5l0
>>
>> Invoking robotic test utility:
>> /usr/openv/volmgr/bin/tldtest -r /dev/sg/c10t5l0 -d1 /dev/rmt/0cbn -d2 
>> /dev/rmt/1cbn -d3 /dev/rmt/2cbn -d4 /dev/rmt/3cbn -d5 /dev/rmt/4cbn -d6 
>> /dev/rmt/5cbn -d7 /dev/rmt/6cbn
>>
>> Opening /dev/sg/c10t5l0
>> MODE_SENSE complete
>> Enter tld commands (? returns help information)
>> s d
>> drive 1 (addr 500) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 1 is 0
>> drive 2 (addr 501) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 2 is 1
>> drive 3 (addr 502) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 3 is 2
>> drive 4 (addr 503) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 4 is 3
>> drive 5 (addr 504) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 5 is 4
>> drive 6 (addr 505) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 6 is 5
>> drive 7 (addr 506) access = 1 Contains Cartridge = yes
>> Source address = 1510 (slot 511)
>> Barcode = 000610
>> SCSI ID from drive 7 is 6
>> << Press return to continue, or q and return to stop >>
>>
>> drive 8 (addr 507) access = 1 Contains Cartridge = no
>> SCSI ID from drive 8 is 8
>> READ_ELEMENT_STATUS complete
>>
>>
>> Here's the logs when a backup is attempted:
>>
>> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) key = 0x4, asc = 0x40, 
>> ascq = 0x2, UNKNOWN ERROR, KEY: 0x04, ASC: 0x40, ASCQ: 0x02
>> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) Move_medium error
>> Nov  6 11:16:51 nb-master-01 tldcd[7439]: TLD(0) cannot clear drive 4 
>> error, drive asc=0x40, ascq=0x2
>> Nov  6 11:16:51 nb-master-01 tldcd[7441]: TLD(0) cannot clear drive 3 
>> error, drive asc=0x40, ascq=0x2
>> tpconfig -lNov  6 11:16:51 nb-master-01 tldcd[7445]: TLD(0) cannot clear 
>> drive 5 error, drive asc=0x40, ascq=0x2
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
>> being DOWNED, status: Robotic mount failure
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>> Nov  6 11:16:51 nb-master-01 tldcd[7447]: TLD(0) cannot clear drive 6 
>> error, drive asc=0x40, ascq=0x2
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 4 (device 1) is 
>> being DOWNED, status: Robotic mount failure
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 3 (device 0) is 
>> being DOWNED, status: Robotic mount failure
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 5 (device 2) is 
>> being DOWNED, status: Robotic mount failure
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 6 (device 3) is 
>> being DOWNED, status: Robotic mount failure
>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>> Nov  6 11:16:52 nb-master-01 tldcd[7457]: TLD(0) cannot clear drive 1 
>> error, drive asc=0x40, ascq=0x2
>> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) drive 1 (device 5) is 
>> being DOWNED, status: Robotic mount failure
>> Nov  6 11:16:52 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>> Nov  6 11:16:52 nb-master-01 tldcd[7466]: TLD(0) cannot clear drive 2 
>> error, drive asc=0x40, ascq=0x2
>> Nov  6 11:16:52 nb-master-01 ltid[6960]: Request for media ID 000610 is 
>> being rejected because the media appears to be unmountable
>> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) bad media suspected; 
>> configuring device 4 back UP
>> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) key = 0x5, asc = 0x3a, 
>> ascq = 0x0, MEDIUM NOT PRESENT
>> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) Move_medium error
>> Nov  6 11:16:54 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
>> being DOWNED, status: Unable to SCSI unload drive
>> Nov  6 11:16:54 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>> Nov  6 11:16:55 nb-master-01 tldcd[7484]: TLD(0) cannot clear drive 2 
>> error, drive asc=0x40, ascq=0x2
>> Nov  6 11:16:55 nb-master-01 tldd[7015]: TLD(0) drive 2 (device 7) is 
>> being DOWNED, status: Robotic mount failure
>> Nov  6 11:16:55 nb-master-01 tldd[7015]: Check integrity of the drive, 
>> drive path, and media
>>
>>
>> Any help would be GREATLY appreciated!
>>
>> Thanks!
>>
>>    - Mike
>>
>> _______________________________________________
>> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> 

-- 
Mike Jackson          <mike.m.jackson at ca.mci.com>
UNIX Administrator, MCI Canada Hosting Operations
Juniper Networks Certified              JNCIA #85