Veritas-bu

[Veritas-bu] Serious problem - DLT7000 tape drives gone "not functional"

2006-11-06 14:07:50
Subject: [Veritas-bu] Serious problem - DLT7000 tape drives gone "not functional"
From: mike.m.jackson at ca.mci.com (Mike Jackson)
Date: Mon, 06 Nov 2006 14:07:50 -0500
Hey Mike,

Thanks for the heads up on this.  I actually set the "On Bus" option to 
ON (default is OFF) when I manually reset the SCSI ID's for the drives 
even though my robot is on it's own SCSI card:

/dev/sg/c10t5l0: Changer: "STK     L700"
/dev/sg/c2t0l0: Tape (/dev/rmt/0): "QUANTUM DLT7000"
/dev/sg/c2t1l0: Tape (/dev/rmt/1): "QUANTUM DLT7000"
/dev/sg/c4t2l0: Tape (/dev/rmt/2): "QUANTUM DLT7000"
/dev/sg/c4t3l0: Tape (/dev/rmt/3): "QUANTUM DLT7000"
/dev/sg/c6t4l0: Tape (/dev/rmt/4): "QUANTUM DLT7000"
/dev/sg/c6t5l0: Tape (/dev/rmt/5): "QUANTUM DLT7000"
/dev/sg/c8t6l0: Tape (/dev/rmt/6): "QUANTUM DLT7000"

As a test, I just turned off "On Bus" for the drives and did a reset. 
Now only 2 of the 8 drives are "NOT FUNCTIONAL" (drives 1 and 2) --->

Opening /dev/sg/c10t5l0
MODE_SENSE complete
Enter tld commands (? returns help information)
s d
drive 1 (addr 500) access = 0 Contains Cartridge = no
Sense code = 0x40, Code qualifier = 0x2
SCSI ID from drive 1 is 0
drive 2 (addr 501) access = 0 Contains Cartridge = no
Sense code = 0x40, Code qualifier = 0x2
SCSI ID from drive 2 is 1
drive 3 (addr 502) access = 1 Contains Cartridge = no
SCSI ID from drive 3 is 2
drive 4 (addr 503) access = 1 Contains Cartridge = no
SCSI ID from drive 4 is 3
drive 5 (addr 504) access = 1 Contains Cartridge = no
SCSI ID from drive 5 is 4
drive 6 (addr 505) access = 1 Contains Cartridge = no
SCSI ID from drive 6 is 5
drive 7 (addr 506) access = 1 Contains Cartridge = no
SCSI ID from drive 7 is 6
drive 8 (addr 507) access = 1 Contains Cartridge = no
SCSI ID from drive 8 is 8
READ_ELEMENT_STATUS complete


I'm going to look up the Sense Code 0x40 / Code qualifier 0x2 and see 
what it's all about, maybe I have some bad drives.

Thanks again,

   - Mike

Mike Dunn (veritas-bu) wrote:
> Mike,
> 
> The fact that your L700 configuration was cleared leads me to wonder about
> one very critical setting in the L700 (or STK TLD libraries in general). 
> If you are using SCSI to control your robot, AND your robot is physically
> on the same SCSI bus as another tape drive, make certain that the "On Bus"
> setting is enabled for your library.  For some reason, having this setting
> disabled when the robot and drives share a bus causes very random
> behaivour.  By default, I believe, it is disabled.
> 
>   Cheers
>   Mike
> 
> 
> Message: 1
> Date: Mon, 06 Nov 2006 11:45:20 -0500
> From: Mike Jackson <mike.m.jackson at ca.mci.com>
> Subject: [Veritas-bu] Serious problem - DLT7000 tape drives gone "not
>       functional"
> To: veritas-bu at mailman.eng.auburn.edu
> Message-ID: <454F66A0.3040200 at ca.mci.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Hello all,
> 
> We're running a Solaris w/ NetBackup 5.0 master server environment with 
> a SCSI attached StorageTek L700 library with eight DLT7000 drives.  We 
> ran into an "event" the other night which cleared the L700 configuration 
> which reset all of the drive SCSI ID's to Invalid.  I manually 
> reconfigured the SCSI ID's 00 through 08 (skipping 07 which we cannot 
> use).  SGSCAN sees the drives but when I try robtest or run manual 
> backups the environment goes crazy and DOWN's all the drives.  I've got 
> a support ticket opened with StorageTek but at this point they're not 
> sure what the problem could be.  The LCD display on the L700 library 
> says "NOT FUNCTIONAL" for all the drives even after a reboot.
> 
> Here's some information from sgscan / tpconfig && robtest:
> 
> [nb-master-01:ROOT](~): sgscan
> .
> /dev/sg/c10t5l0: Changer: "STK     L700"
> .
> /dev/sg/c2t0l0: Tape (/dev/rmt/0): "QUANTUM DLT7000"
> /dev/sg/c2t1l0: Tape (/dev/rmt/1): "QUANTUM DLT7000"
> /dev/sg/c4t2l0: Tape (/dev/rmt/2): "QUANTUM DLT7000"
> /dev/sg/c4t3l0: Tape (/dev/rmt/3): "QUANTUM DLT7000"
> /dev/sg/c6t4l0: Tape (/dev/rmt/4): "QUANTUM DLT7000"
> /dev/sg/c6t5l0: Tape (/dev/rmt/5): "QUANTUM DLT7000"
> /dev/sg/c8t6l0: Tape (/dev/rmt/6): "QUANTUM DLT7000"
> [nb-master-01:ROOT](~):
> 
> [nb-master-01:ROOT](~): tpconfig -l
> Device Robot Drive       Robot                    Drive 
> Device         Second
> Type     Num Index  Type DrNum Status  Comment    Name             Path 
>            Device Path
> robot      0    -    TLD    -       -  -          - 
> /dev/sg/c10t5l0
>    drive    -    0    dlt    3      UP  -          QUANTUMDLT70003 
> /dev/rmt/2cbn
>    drive    -    1    dlt    4      UP  -          QUANTUMDLT70004 
> /dev/rmt/3cbn
>    drive    -    2    dlt    5      UP  -          QUANTUMDLT70005 
> /dev/rmt/4cbn
>    drive    -    3    dlt    6      UP  -          QUANTUMDLT70006 
> /dev/rmt/5cbn
>    drive    -    4    dlt    7      UP  -          QUANTUMDLT70007 
> /dev/rmt/6cbn
>    drive    -    5    dlt    1      UP  -          QUANTUMDLT70001 
> /dev/rmt/0cbn
>    drive    -    7    dlt    2      UP  -          QUANTUMDLT70002 
> /dev/rmt/1cbn
> 
> Robot selected: TLD(0)   robotic path = /dev/sg/c10t5l0
> 
> Invoking robotic test utility:
> /usr/openv/volmgr/bin/tldtest -r /dev/sg/c10t5l0 -d1 /dev/rmt/0cbn -d2 
> /dev/rmt/1cbn -d3 /dev/rmt/2cbn -d4 /dev/rmt/3cbn -d5 /dev/rmt/4cbn -d6 
> /dev/rmt/5cbn -d7 /dev/rmt/6cbn
> 
> Opening /dev/sg/c10t5l0
> MODE_SENSE complete
> Enter tld commands (? returns help information)
> s d
> drive 1 (addr 500) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 1 is 0
> drive 2 (addr 501) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 2 is 1
> drive 3 (addr 502) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 3 is 2
> drive 4 (addr 503) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 4 is 3
> drive 5 (addr 504) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 5 is 4
> drive 6 (addr 505) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 6 is 5
> drive 7 (addr 506) access = 1 Contains Cartridge = yes
> Source address = 1510 (slot 511)
> Barcode = 000610
> SCSI ID from drive 7 is 6
> << Press return to continue, or q and return to stop >>
> 
> drive 8 (addr 507) access = 1 Contains Cartridge = no
> SCSI ID from drive 8 is 8
> READ_ELEMENT_STATUS complete
> 
> 
> Here's the logs when a backup is attempted:
> 
> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) key = 0x4, asc = 0x40, 
> ascq = 0x2, UNKNOWN ERROR, KEY: 0x04, ASC: 0x40, ASCQ: 0x02
> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) Move_medium error
> Nov  6 11:16:51 nb-master-01 tldcd[7439]: TLD(0) cannot clear drive 4 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:51 nb-master-01 tldcd[7441]: TLD(0) cannot clear drive 3 
> error, drive asc=0x40, ascq=0x2
> tpconfig -lNov  6 11:16:51 nb-master-01 tldcd[7445]: TLD(0) cannot clear 
> drive 5 error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldcd[7447]: TLD(0) cannot clear drive 6 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 4 (device 1) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 3 (device 0) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 5 (device 2) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 6 (device 3) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:52 nb-master-01 tldcd[7457]: TLD(0) cannot clear drive 1 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) drive 1 (device 5) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:52 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:52 nb-master-01 tldcd[7466]: TLD(0) cannot clear drive 2 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:52 nb-master-01 ltid[6960]: Request for media ID 000610 is 
> being rejected because the media appears to be unmountable
> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) bad media suspected; 
> configuring device 4 back UP
> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) key = 0x5, asc = 0x3a, 
> ascq = 0x0, MEDIUM NOT PRESENT
> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) Move_medium error
> Nov  6 11:16:54 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
> being DOWNED, status: Unable to SCSI unload drive
> Nov  6 11:16:54 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:55 nb-master-01 tldcd[7484]: TLD(0) cannot clear drive 2 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:55 nb-master-01 tldd[7015]: TLD(0) drive 2 (device 7) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:55 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> 
> 
> Any help would be GREATLY appreciated!
> 
> Thanks!
> 
>    - Mike
> 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

-- 
Mike Jackson          <mike.m.jackson at ca.mci.com>
UNIX Administrator, MCI Canada Hosting Operations
Juniper Networks Certified              JNCIA #85