Veritas-bu

[Veritas-bu] Serious problem - DLT7000 tape drives gone "not functional"

2006-11-06 14:48:25
Subject: [Veritas-bu] Serious problem - DLT7000 tape drives gone "not functional"
From: mike.m.jackson at ca.mci.com (Mike Jackson)
Date: Mon, 06 Nov 2006 14:48:25 -0500
Sorry, disregard this email, I'm a little trigger happy.  I don't think 
I waited long enough for the drive to finish mounting the tape (90+ 
seconds) before trying to unload it :)

Cheers,

   - Mike

Mike Jackson wrote:
> Hey all,
> 
> It seems like I have control over moving media around in the silo now. 
> However, when I want to unload the media via robtest, I have to do it 
> twice.  Is this normal?
> 
> m s1 d3
> Initiating MOVE_MEDIUM from address 1000 to 502
> MOVE_MEDIUM complete
> 
> unload d3
> Opening /dev/rmt/2cbn, please wait...
> Error - cannot open /dev/rmt/2cbn (I/O error)
> 
> unload d3
> Opening /dev/rmt/2cbn, please wait...
> Tape successfully SCSI unloaded, ready for SCSI2 unload
> 
> m d3 d4
> Initiating MOVE_MEDIUM from address 502 to 503
> MOVE_MEDIUM complete
> 
> 
> Thanks,
> 
>    - Mike
> 
> 
> Mike Jackson wrote:
>> Hey Mike,
>>
>> Thanks for the heads up on this.  I actually set the "On Bus" option to 
>> ON (default is OFF) when I manually reset the SCSI ID's for the drives 
>> even though my robot is on it's own SCSI card:
>>
>> /dev/sg/c10t5l0: Changer: "STK     L700"
>> /dev/sg/c2t0l0: Tape (/dev/rmt/0): "QUANTUM DLT7000"
>> /dev/sg/c2t1l0: Tape (/dev/rmt/1): "QUANTUM DLT7000"
>> /dev/sg/c4t2l0: Tape (/dev/rmt/2): "QUANTUM DLT7000"
>> /dev/sg/c4t3l0: Tape (/dev/rmt/3): "QUANTUM DLT7000"
>> /dev/sg/c6t4l0: Tape (/dev/rmt/4): "QUANTUM DLT7000"
>> /dev/sg/c6t5l0: Tape (/dev/rmt/5): "QUANTUM DLT7000"
>> /dev/sg/c8t6l0: Tape (/dev/rmt/6): "QUANTUM DLT7000"
>>
>> As a test, I just turned off "On Bus" for the drives and did a reset. 
>> Now only 2 of the 8 drives are "NOT FUNCTIONAL" (drives 1 and 2) --->
>>
>> Opening /dev/sg/c10t5l0
>> MODE_SENSE complete
>> Enter tld commands (? returns help information)
>> s d
>> drive 1 (addr 500) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 1 is 0
>> drive 2 (addr 501) access = 0 Contains Cartridge = no
>> Sense code = 0x40, Code qualifier = 0x2
>> SCSI ID from drive 2 is 1
>> drive 3 (addr 502) access = 1 Contains Cartridge = no
>> SCSI ID from drive 3 is 2
>> drive 4 (addr 503) access = 1 Contains Cartridge = no
>> SCSI ID from drive 4 is 3
>> drive 5 (addr 504) access = 1 Contains Cartridge = no
>> SCSI ID from drive 5 is 4
>> drive 6 (addr 505) access = 1 Contains Cartridge = no
>> SCSI ID from drive 6 is 5
>> drive 7 (addr 506) access = 1 Contains Cartridge = no
>> SCSI ID from drive 7 is 6
>> drive 8 (addr 507) access = 1 Contains Cartridge = no
>> SCSI ID from drive 8 is 8
>> READ_ELEMENT_STATUS complete
>>
>>
>> I'm going to look up the Sense Code 0x40 / Code qualifier 0x2 and see 
>> what it's all about, maybe I have some bad drives.
>>
>> Thanks again,
>>
>>    - Mike
>>
>> Mike Dunn (veritas-bu) wrote:
>>> Mike,
>>>
>>> The fact that your L700 configuration was cleared leads me to wonder about
>>> one very critical setting in the L700 (or STK TLD libraries in general). 
>>> If you are using SCSI to control your robot, AND your robot is physically
>>> on the same SCSI bus as another tape drive, make certain that the "On Bus"
>>> setting is enabled for your library.  For some reason, having this setting
>>> disabled when the robot and drives share a bus causes very random
>>> behaivour.  By default, I believe, it is disabled.
>>>
>>>   Cheers
>>>   Mike
>>>
>>>
>>> Message: 1
>>> Date: Mon, 06 Nov 2006 11:45:20 -0500
>>> From: Mike Jackson <mike.m.jackson at ca.mci.com>
>>> Subject: [Veritas-bu] Serious problem - DLT7000 tape drives gone "not
>>>     functional"
>>> To: veritas-bu at mailman.eng.auburn.edu
>>> Message-ID: <454F66A0.3040200 at ca.mci.com>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> Hello all,
>>>
>>> We're running a Solaris w/ NetBackup 5.0 master server environment with 
>>> a SCSI attached StorageTek L700 library with eight DLT7000 drives.  We 
>>> ran into an "event" the other night which cleared the L700 configuration 
>>> which reset all of the drive SCSI ID's to Invalid.  I manually 
>>> reconfigured the SCSI ID's 00 through 08 (skipping 07 which we cannot 
>>> use).  SGSCAN sees the drives but when I try robtest or run manual 
>>> backups the environment goes crazy and DOWN's all the drives.  I've got 
>>> a support ticket opened with StorageTek but at this point they're not 
>>> sure what the problem could be.  The LCD display on the L700 library 
>>> says "NOT FUNCTIONAL" for all the drives even after a reboot.
>>>
>>> Here's some information from sgscan / tpconfig && robtest:
>>>
>>> [nb-master-01:ROOT](~): sgscan
>>> .
>>> /dev/sg/c10t5l0: Changer: "STK     L700"
>>> .
>>> /dev/sg/c2t0l0: Tape (/dev/rmt/0): "QUANTUM DLT7000"
>>> /dev/sg/c2t1l0: Tape (/dev/rmt/1): "QUANTUM DLT7000"
>>> /dev/sg/c4t2l0: Tape (/dev/rmt/2): "QUANTUM DLT7000"
>>> /dev/sg/c4t3l0: Tape (/dev/rmt/3): "QUANTUM DLT7000"
>>> /dev/sg/c6t4l0: Tape (/dev/rmt/4): "QUANTUM DLT7000"
>>> /dev/sg/c6t5l0: Tape (/dev/rmt/5): "QUANTUM DLT7000"
>>> /dev/sg/c8t6l0: Tape (/dev/rmt/6): "QUANTUM DLT7000"
>>> [nb-master-01:ROOT](~):
>>>
>>> [nb-master-01:ROOT](~): tpconfig -l
>>> Device Robot Drive       Robot                    Drive 
>>> Device         Second
>>> Type     Num Index  Type DrNum Status  Comment    Name             Path 
>>>            Device Path
>>> robot      0    -    TLD    -       -  -          - 
>>> /dev/sg/c10t5l0
>>>    drive    -    0    dlt    3      UP  -          QUANTUMDLT70003 
>>> /dev/rmt/2cbn
>>>    drive    -    1    dlt    4      UP  -          QUANTUMDLT70004 
>>> /dev/rmt/3cbn
>>>    drive    -    2    dlt    5      UP  -          QUANTUMDLT70005 
>>> /dev/rmt/4cbn
>>>    drive    -    3    dlt    6      UP  -          QUANTUMDLT70006 
>>> /dev/rmt/5cbn
>>>    drive    -    4    dlt    7      UP  -          QUANTUMDLT70007 
>>> /dev/rmt/6cbn
>>>    drive    -    5    dlt    1      UP  -          QUANTUMDLT70001 
>>> /dev/rmt/0cbn
>>>    drive    -    7    dlt    2      UP  -          QUANTUMDLT70002 
>>> /dev/rmt/1cbn
>>>
>>> Robot selected: TLD(0)   robotic path = /dev/sg/c10t5l0
>>>
>>> Invoking robotic test utility:
>>> /usr/openv/volmgr/bin/tldtest -r /dev/sg/c10t5l0 -d1 /dev/rmt/0cbn -d2 
>>> /dev/rmt/1cbn -d3 /dev/rmt/2cbn -d4 /dev/rmt/3cbn -d5 /dev/rmt/4cbn -d6 
>>> /dev/rmt/5cbn -d7 /dev/rmt/6cbn
>>>
>>> Opening /dev/sg/c10t5l0
>>> MODE_SENSE complete
>>> Enter tld commands (? returns help information)
>>> s d
>>> drive 1 (addr 500) access = 0 Contains Cartridge = no
>>> Sense code = 0x40, Code qualifier = 0x2
>>> SCSI ID from drive 1 is 0
>>> drive 2 (addr 501) access = 0 Contains Cartridge = no
>>> Sense code = 0x40, Code qualifier = 0x2
>>> SCSI ID from drive 2 is 1
>>> drive 3 (addr 502) access = 0 Contains Cartridge = no
>>> Sense code = 0x40, Code qualifier = 0x2
>>> SCSI ID from drive 3 is 2
>>> drive 4 (addr 503) access = 0 Contains Cartridge = no
>>> Sense code = 0x40, Code qualifier = 0x2
>>> SCSI ID from drive 4 is 3
>>> drive 5 (addr 504) access = 0 Contains Cartridge = no
>>> Sense code = 0x40, Code qualifier = 0x2
>>> SCSI ID from drive 5 is 4
>>> drive 6 (addr 505) access = 0 Contains Cartridge = no
>>> Sense code = 0x40, Code qualifier = 0x2
>>> SCSI ID from drive 6 is 5
>>> drive 7 (addr 506) access = 1 Contains Cartridge = yes
>>> Source address = 1510 (slot 511)
>>> Barcode = 000610
>>> SCSI ID from drive 7 is 6
>>> << Press return to continue, or q and return to stop >>
>>>
>>> drive 8 (addr 507) access = 1 Contains Cartridge = no
>>> SCSI ID from drive 8 is 8
>>> READ_ELEMENT_STATUS complete
>>>
>>>
>>> Here's the logs when a backup is attempted:
>>>
>>> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) key = 0x4, asc = 0x40, 
>>> ascq = 0x2, UNKNOWN ERROR, KEY: 0x04, ASC: 0x40, ASCQ: 0x02
>>> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) Move_medium error
>>> Nov  6 11:16:51 nb-master-01 tldcd[7439]: TLD(0) cannot clear drive 4 
>>> error, drive asc=0x40, ascq=0x2
>>> Nov  6 11:16:51 nb-master-01 tldcd[7441]: TLD(0) cannot clear drive 3 
>>> error, drive asc=0x40, ascq=0x2
>>> tpconfig -lNov  6 11:16:51 nb-master-01 tldcd[7445]: TLD(0) cannot clear 
>>> drive 5 error, drive asc=0x40, ascq=0x2
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
>>> being DOWNED, status: Robotic mount failure
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>> Nov  6 11:16:51 nb-master-01 tldcd[7447]: TLD(0) cannot clear drive 6 
>>> error, drive asc=0x40, ascq=0x2
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 4 (device 1) is 
>>> being DOWNED, status: Robotic mount failure
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 3 (device 0) is 
>>> being DOWNED, status: Robotic mount failure
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 5 (device 2) is 
>>> being DOWNED, status: Robotic mount failure
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 6 (device 3) is 
>>> being DOWNED, status: Robotic mount failure
>>> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>> Nov  6 11:16:52 nb-master-01 tldcd[7457]: TLD(0) cannot clear drive 1 
>>> error, drive asc=0x40, ascq=0x2
>>> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) drive 1 (device 5) is 
>>> being DOWNED, status: Robotic mount failure
>>> Nov  6 11:16:52 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>> Nov  6 11:16:52 nb-master-01 tldcd[7466]: TLD(0) cannot clear drive 2 
>>> error, drive asc=0x40, ascq=0x2
>>> Nov  6 11:16:52 nb-master-01 ltid[6960]: Request for media ID 000610 is 
>>> being rejected because the media appears to be unmountable
>>> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) bad media suspected; 
>>> configuring device 4 back UP
>>> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) key = 0x5, asc = 0x3a, 
>>> ascq = 0x0, MEDIUM NOT PRESENT
>>> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) Move_medium error
>>> Nov  6 11:16:54 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
>>> being DOWNED, status: Unable to SCSI unload drive
>>> Nov  6 11:16:54 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>> Nov  6 11:16:55 nb-master-01 tldcd[7484]: TLD(0) cannot clear drive 2 
>>> error, drive asc=0x40, ascq=0x2
>>> Nov  6 11:16:55 nb-master-01 tldd[7015]: TLD(0) drive 2 (device 7) is 
>>> being DOWNED, status: Robotic mount failure
>>> Nov  6 11:16:55 nb-master-01 tldd[7015]: Check integrity of the drive, 
>>> drive path, and media
>>>
>>>
>>> Any help would be GREATLY appreciated!
>>>
>>> Thanks!
>>>
>>>    - Mike
>>>
>>> _______________________________________________
>>> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> 

-- 
Mike Jackson          <mike.m.jackson at ca.mci.com>
UNIX Administrator, MCI Canada Hosting Operations
Juniper Networks Certified              JNCIA #85