Hi all,
since a few days we have trouble backing up some clients.
We are using NetBackup 3.1.1 on HPUX 10.20 and a DLT7000-Library HP4845A
(based on a STK Lib).
The jobs end with error 41 or 84. I've tested all and I'm sure it is not a
network problem or a media problem. Neither Master Server nor clients seem
to have a problem and ping is okay. The clients are NT and HPUX.
A typical status sequence in xbpmon for such a job is:
...
02/26/01 22:21:03 - begin writing
02/26/01 22:24:09 - end writing
(84) media write error
and
...
02/26/01 22:28:09 - begin writing
02/26/01 22:33:41 - end writing
(41) network connection timed out
After this job the DLT drive is downed by tldd. In syslog I found the
following:
...
Feb 26 22:22:02 blnhp1 tldcd[14076]: valid = 1, sel = 7, barcode = (000008
)
Feb 26 22:24:29 blnhp1 tldcd[14189]: valid = 1, sel = 454, barcode = (000102
)
Feb 26 22:24:45 blnhp1 tldd[14195]: TLD(5) open failed in io_open, No such
device or address
Feb 26 22:24:45 blnhp1 tldd[14195]: TLD(5) unload==TRUE, but no unload, drive 6
(device 0)
Feb 26 22:25:53 blnhp1 tldd[14195]: TLD(5) [12684] waited 0 times for ready,
drive 6
Feb 26 22:39:53 blnhp1 tldd[14511]: TLD(5) unload failed in io_open, I/O
error[5]
Feb 26 22:39:53 blnhp1 tldd[12684]: TLD(5) drive 6 (device 0) is being DOWNED,
status: Unable to SCSI unload drive
Feb 26 22:39:53 blnhp1 tldd[12684]: Check integrity of the drive, drive path,
and media
Feb 26 22:40:06 blnhp1 tldd[14842]: TLD(5) open failed in io_open, No such
device or address
Feb 26 22:40:06 blnhp1 tldd[14842]: TLD(5) unload==TRUE, but no unload, drive 5
(device 1)
Feb 26 22:40:15 blnhp1 ltid[12666]: Request for EVSN 000102 is being rejected
because it is in a DOWN drive
Feb 26 22:40:16 blnhp1 tldd[14850]: TLD(5) open failed in io_open, No such
device or address
Feb 26 22:40:16 blnhp1 tldd[14850]: TLD(5) unload==TRUE, but no unload, drive 5
(device 1)
Feb 26 22:41:25 blnhp1 tldd[14850]: TLD(5) [12684] waited 0 times for ready,
drive 5
Feb 26 22:42:54 blnhp1 tldcd[14984]: valid = 1, sel = 7, barcode = (000008
)
Feb 26 22:43:08 blnhp1 tldcd[14984]: TLD(5) key = 0x0, asc = 0x0, ascq = 0x0,
NO ADDITIONAL SENSE INFORMATION
Feb 26 22:43:15 blnhp1 tldcd[14984]: TLD(5) Move_medium error: CHECK CONDITION
Feb 26 22:43:15 blnhp1 tldd[12684]: TLD(5) drive 5 (device 1) is being DOWNED,
status: Robotic dismount failure
Feb 26 22:43:15 blnhp1 tldd[12684]: Check integrity of the drive, drive path,
and media
...
This indicates some trouble with the library. The tapes remain in the drive
but I can manually unload them and move them back from drive to slot with
HPUX mt and mc commands. After setting drives to up, the next jobs run
without errors until...
HPUX logs no problems with Hardware.
After all I guess a ressource problem on our master server. We use standard
kernel of HPUX. Are there any kernel parameters to tune? Are there critical
patches we had to install?
Which NetBackup logs can I monitor to find out what went wrong?
Regards
Olaf Behnke
Schindler Deutschland Holding GmbH
|