Veritas-bu

[Veritas-bu] STK Experts, need help

2002-07-01 14:58:56
Subject: [Veritas-bu] STK Experts, need help
From: pawilhelm AT davidson DOT edu (Wilhelm, Patrick)
Date: Mon, 1 Jul 2002 14:58:56 -0400
David - 

Is the autoclean feature enabled on this STK?  We were told that it shouldn't 
be on our L700 using NB 3.4.1.  Someone else can maybe confirm.  Do you have 
syslog on the box controlling the STK?

-----Original Message-----
From: David A. Chapa [mailto:david AT datastaff DOT com]
Sent: Monday, July 01, 2002 11:54 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] STK Experts, need help



Attached is an excerpt from a bptm log for a particular bptm process that was 
running over the weekend.  What eventually happened is one of the drives was 
down'd by NBU during the duplication process.  I was wondering if it was a 
media problem, but then I noticed that the media that was in use was used by 
another duplication stream, then another, etc.  It just finished writing a few 
minutes ago, successfully I might add.  Now I'm wondering if it is the physical 
drive.

The entries in particular that I'm interested in hashing out are the 
tape_error_rec: entries in the bptm log.

What does that mean?

What's with the delay 3 minutes before next attempt, tries left = 5

[that means a total of 18 minutes of possible delays, what do the delays 
constitute?]

And then the entries at 
03:57:16 <2> tape_error_rec: attempting error recovery, delay 3 minutes before 
next attempt, tries left = 3
04:00:16 <2> tape_error_rec: absolute block position after error is 280103
04:00:16 <2> tape_error_rec: locating to absolute block number 280103 for error 
recovery
^^^^^^^
What kind of recovery is it attempting???

04:01:02 <2> tape_error_rec: locate failed in error recovery, locate scsi 
command failed, key = 0x4, asc = 0x44, ascq = 0xb6
^^^^^^^^^^^^^^
Failed the recovery with a SCSI command failure?  Does this point to the 
DRIVE?  ACS/LS (incidentally ACS/LS has been installed and working great with 
not problems for quite some time)

04:01:02 <2> tape_error_rec: attempting error recovery, delay 3 minutes before 
next attempt, tries left = 2
04:04:02 <2> tape_error_rec: absolute block position after error is 280035
04:04:02 <16> write_data: cannot write image to media id ZA0962, drive index 
104, I/O error
04:04:02 <2> log_media_error: successfully wrote to error file - 07/01/02 
04:04:02 ZA0962 104 WRITE_ERROR
04:04:02 <2> wait_for_sigcld: waiting for child to exit, timeout is 300
04:04:02 <2> check_error_history: called from bptm line 12312, EXIT_Status = 84
04:04:03 <2> check_error_history: drive index = 104, media id = ZA0962, time = 
07/01/02 04:04:02, both_match = 0, media_match = 0, drive_match = 2
04:04:03 <2> tpunmount: tpunmount'ing /usr/openv/netbackup/db/media/tpreq/ZA0962
04:04:03 <8> check_error_history: DOWN'ing drive index 104, it has had at least 
3 errors in last 12 hour(s)

I've also attached a small excerpt from the messages file as well.

Any ideas would be greatly appreciated.

TIA
David

<Prev in Thread] Current Thread [Next in Thread>