Veritas-bu

[Veritas-bu] Broken Jobs and downed DLTs

2001-02-27 09:37:24
Subject: [Veritas-bu] Broken Jobs and downed DLTs
From: olaf_behnke AT de.schindler DOT com olaf_behnke AT de.schindler DOT com
Date: Tue, 27 Feb 2001 15:37:24 +0100
Hi all,
since a few days we have trouble backing up some clients.
We are using NetBackup 3.1.1 on HPUX 10.20 and a DLT7000-Library HP4845A
(based on a STK Lib).
The jobs end with error 41 or 84. I've tested all and I'm sure it is not a
network problem or a media problem. Neither Master Server nor clients seem
to have a problem and ping is okay. The clients are NT and HPUX.
A typical status sequence in xbpmon for such a job is:

...
02/26/01 22:21:03 - begin writing
02/26/01 22:24:09 - end writing
(84) media write error

and

...
02/26/01 22:28:09 - begin writing
02/26/01 22:33:41 - end writing
(41) network connection timed out


After this job the DLT drive is downed by tldd. In syslog I found the
      following:

...
Feb 26 22:22:02 blnhp1 tldcd[14076]: valid = 1, sel = 7, barcode = (000008      
                    )
Feb 26 22:24:29 blnhp1 tldcd[14189]: valid = 1, sel = 454, barcode = (000102    
                      )
Feb 26 22:24:45 blnhp1 tldd[14195]: TLD(5) open failed in io_open, No such 
device or address
Feb 26 22:24:45 blnhp1 tldd[14195]: TLD(5) unload==TRUE, but no unload, drive 6 
(device 0)
Feb 26 22:25:53 blnhp1 tldd[14195]: TLD(5) [12684] waited 0 times for ready, 
drive 6
Feb 26 22:39:53 blnhp1 tldd[14511]: TLD(5) unload failed in io_open, I/O 
error[5]
Feb 26 22:39:53 blnhp1 tldd[12684]: TLD(5) drive 6 (device 0) is being DOWNED, 
status: Unable to SCSI unload drive
Feb 26 22:39:53 blnhp1 tldd[12684]: Check integrity of the drive, drive path, 
and media
Feb 26 22:40:06 blnhp1 tldd[14842]: TLD(5) open failed in io_open, No such 
device or address
Feb 26 22:40:06 blnhp1 tldd[14842]: TLD(5) unload==TRUE, but no unload, drive 5 
(device 1)
Feb 26 22:40:15 blnhp1 ltid[12666]: Request for EVSN 000102 is being rejected 
because it is in a DOWN drive
Feb 26 22:40:16 blnhp1 tldd[14850]: TLD(5) open failed in io_open, No such 
device or address
Feb 26 22:40:16 blnhp1 tldd[14850]: TLD(5) unload==TRUE, but no unload, drive 5 
(device 1)
Feb 26 22:41:25 blnhp1 tldd[14850]: TLD(5) [12684] waited 0 times for ready, 
drive 5
Feb 26 22:42:54 blnhp1 tldcd[14984]: valid = 1, sel = 7, barcode = (000008      
                    )
Feb 26 22:43:08 blnhp1 tldcd[14984]: TLD(5) key = 0x0, asc = 0x0, ascq = 0x0, 
NO ADDITIONAL SENSE INFORMATION
Feb 26 22:43:15 blnhp1 tldcd[14984]: TLD(5) Move_medium error: CHECK CONDITION
Feb 26 22:43:15 blnhp1 tldd[12684]: TLD(5) drive 5 (device 1) is being DOWNED, 
status: Robotic dismount failure
Feb 26 22:43:15 blnhp1 tldd[12684]: Check integrity of the drive, drive path, 
and media
...

This indicates some trouble with the library. The tapes remain in the drive
but  I  can manually unload them and move them back from drive to slot with
HPUX  mt  and  mc  commands.  After setting drives to up, the next jobs run
without errors until...
HPUX logs no problems with Hardware.

After all I guess a ressource problem on our master server. We use standard
kernel of HPUX. Are there any kernel parameters to tune? Are there critical
patches we had to install?
Which NetBackup logs can I monitor to find out what went wrong?

Regards
Olaf Behnke
Schindler Deutschland Holding GmbH




<Prev in Thread] Current Thread [Next in Thread>
  • [Veritas-bu] Broken Jobs and downed DLTs, olaf_behnke <=