Veritas-bu

[Veritas-bu] SUMMARY: Serious problem - DLT7000 tape drives gone "not functional"

2006-11-09 16:30:15
Subject: [Veritas-bu] SUMMARY: Serious problem - DLT7000 tape drives gone "not functional"
From: mike.m.jackson at ca.mci.com (Mike Jackson)
Date: Thu, 09 Nov 2006 16:30:15 -0500
Hello all,

I had StorageTek come in and replace the robotic MPCL card in the L700 
library but that didn't fix the problem.  It turned out to be a bad 
communications/signalling cable on the drive chassis that houses the 
tape drives.  It's the cable that the library uses to communicate with 
the drives.  They ganked the old one out and put in a new one and all my 
drives came back up "empty" :)

Cheers,

   - Mike

Mike Jackson wrote:
> Hello all,
> 
> We're running a Solaris w/ NetBackup 5.0 master server environment with 
> a SCSI attached StorageTek L700 library with eight DLT7000 drives.  We 
> ran into an "event" the other night which cleared the L700 configuration 
> which reset all of the drive SCSI ID's to Invalid.  I manually 
> reconfigured the SCSI ID's 00 through 08 (skipping 07 which we cannot 
> use).  SGSCAN sees the drives but when I try robtest or run manual 
> backups the environment goes crazy and DOWN's all the drives.  I've got 
> a support ticket opened with StorageTek but at this point they're not 
> sure what the problem could be.  The LCD display on the L700 library 
> says "NOT FUNCTIONAL" for all the drives even after a reboot.
> 
> Here's some information from sgscan / tpconfig && robtest:
> 
> [nb-master-01:ROOT](~): sgscan
> ..
> /dev/sg/c10t5l0: Changer: "STK     L700"
> ..
> /dev/sg/c2t0l0: Tape (/dev/rmt/0): "QUANTUM DLT7000"
> /dev/sg/c2t1l0: Tape (/dev/rmt/1): "QUANTUM DLT7000"
> /dev/sg/c4t2l0: Tape (/dev/rmt/2): "QUANTUM DLT7000"
> /dev/sg/c4t3l0: Tape (/dev/rmt/3): "QUANTUM DLT7000"
> /dev/sg/c6t4l0: Tape (/dev/rmt/4): "QUANTUM DLT7000"
> /dev/sg/c6t5l0: Tape (/dev/rmt/5): "QUANTUM DLT7000"
> /dev/sg/c8t6l0: Tape (/dev/rmt/6): "QUANTUM DLT7000"
> [nb-master-01:ROOT](~):
> 
> [nb-master-01:ROOT](~): tpconfig -l
> Device Robot Drive       Robot                    Drive Device         
> Second
> Type     Num Index  Type DrNum Status  Comment    Name             Path 
>           Device Path
> robot      0    -    TLD    -       -  -          - /dev/sg/c10t5l0
>   drive    -    0    dlt    3      UP  -          QUANTUMDLT70003 
> /dev/rmt/2cbn
>   drive    -    1    dlt    4      UP  -          QUANTUMDLT70004 
> /dev/rmt/3cbn
>   drive    -    2    dlt    5      UP  -          QUANTUMDLT70005 
> /dev/rmt/4cbn
>   drive    -    3    dlt    6      UP  -          QUANTUMDLT70006 
> /dev/rmt/5cbn
>   drive    -    4    dlt    7      UP  -          QUANTUMDLT70007 
> /dev/rmt/6cbn
>   drive    -    5    dlt    1      UP  -          QUANTUMDLT70001 
> /dev/rmt/0cbn
>   drive    -    7    dlt    2      UP  -          QUANTUMDLT70002 
> /dev/rmt/1cbn
> 
> Robot selected: TLD(0)   robotic path = /dev/sg/c10t5l0
> 
> Invoking robotic test utility:
> /usr/openv/volmgr/bin/tldtest -r /dev/sg/c10t5l0 -d1 /dev/rmt/0cbn -d2 
> /dev/rmt/1cbn -d3 /dev/rmt/2cbn -d4 /dev/rmt/3cbn -d5 /dev/rmt/4cbn -d6 
> /dev/rmt/5cbn -d7 /dev/rmt/6cbn
> 
> Opening /dev/sg/c10t5l0
> MODE_SENSE complete
> Enter tld commands (? returns help information)
> s d
> drive 1 (addr 500) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 1 is 0
> drive 2 (addr 501) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 2 is 1
> drive 3 (addr 502) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 3 is 2
> drive 4 (addr 503) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 4 is 3
> drive 5 (addr 504) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 5 is 4
> drive 6 (addr 505) access = 0 Contains Cartridge = no
> Sense code = 0x40, Code qualifier = 0x2
> SCSI ID from drive 6 is 5
> drive 7 (addr 506) access = 1 Contains Cartridge = yes
> Source address = 1510 (slot 511)
> Barcode = 000610
> SCSI ID from drive 7 is 6
> << Press return to continue, or q and return to stop >>
> 
> drive 8 (addr 507) access = 1 Contains Cartridge = no
> SCSI ID from drive 8 is 8
> READ_ELEMENT_STATUS complete
> 
> 
> Here's the logs when a backup is attempted:
> 
> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) key = 0x4, asc = 0x40, 
> ascq = 0x2, UNKNOWN ERROR, KEY: 0x04, ASC: 0x40, ASCQ: 0x02
> Nov  6 11:16:51 nb-master-01 tldcd[7433]: TLD(0) Move_medium error
> Nov  6 11:16:51 nb-master-01 tldcd[7439]: TLD(0) cannot clear drive 4 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:51 nb-master-01 tldcd[7441]: TLD(0) cannot clear drive 3 
> error, drive asc=0x40, ascq=0x2
> tpconfig -lNov  6 11:16:51 nb-master-01 tldcd[7445]: TLD(0) cannot clear 
> drive 5 error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldcd[7447]: TLD(0) cannot clear drive 6 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 4 (device 1) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 3 (device 0) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 5 (device 2) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:51 nb-master-01 tldd[7015]: TLD(0) drive 6 (device 3) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:51 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:52 nb-master-01 tldcd[7457]: TLD(0) cannot clear drive 1 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) drive 1 (device 5) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:52 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:52 nb-master-01 tldcd[7466]: TLD(0) cannot clear drive 2 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:52 nb-master-01 ltid[6960]: Request for media ID 000610 is 
> being rejected because the media appears to be unmountable
> Nov  6 11:16:52 nb-master-01 tldd[7015]: TLD(0) bad media suspected; 
> configuring device 4 back UP
> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) key = 0x5, asc = 0x3a, 
> ascq = 0x0, MEDIUM NOT PRESENT
> Nov  6 11:16:54 nb-master-01 tldcd[7476]: TLD(0) Move_medium error
> Nov  6 11:16:54 nb-master-01 tldd[7015]: TLD(0) drive 7 (device 4) is 
> being DOWNED, status: Unable to SCSI unload drive
> Nov  6 11:16:54 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> Nov  6 11:16:55 nb-master-01 tldcd[7484]: TLD(0) cannot clear drive 2 
> error, drive asc=0x40, ascq=0x2
> Nov  6 11:16:55 nb-master-01 tldd[7015]: TLD(0) drive 2 (device 7) is 
> being DOWNED, status: Robotic mount failure
> Nov  6 11:16:55 nb-master-01 tldd[7015]: Check integrity of the drive, 
> drive path, and media
> 
> 
> Any help would be GREATLY appreciated!
> 
> Thanks!
> 
>   - Mike
> 

-- 
Mike Jackson          <mike.m.jackson at ca.mci.com>
UNIX Administrator, MCI Canada Hosting Operations
Juniper Networks Certified              JNCIA #85