[Veritas-bu] Worrying bptm log entry and 84 error later on

I just ran into this problem as well, and it turns out it was a bad HBA.
The server had three HBA's for SAN and tape connectivity.  Two were for
the old HP SAN, and one for the new EMC SAN.  The two now only server the
LTO tape drives.  I yanked one of the two connections and the errors stopped.
I tried replacing the cable, and errors resumed.  I moved switch ports and
replaced Gbics.  So I narrowed it down to the HBA (thanks Greg).

Try mounting a tape and doing a regular OS tar to the tar on a good sized
filesystem to see if it barfs.

--
Dan

marshall.a.skare AT accenture DOT com wrote:

> Hi everyone,
> 
>  
> 
> I?m trying to track down the cause of some 84 and 14 failures we?ve been 
> having lately.  I know the problem is not tape, drive, job, time-of-day 
> or robot specific.  However the problem seems to be happening on the 
> weekends when we run our full backups.  We?re on NBU 4.5GA running on 
> Solaris 8 master and media servers with an STK L700 library that has 
> Seagate Viper 200s in it.
> 
>  
> 
> One of the entries I found in a bptm log bothered me a little bit.  Is 
> this normal, or should I perform an inventory on the robot?  The first 
> line says the media is not in the correct storage unit or volume pool.  
> However, it looks like the tape ends up being used anyway, and it also 
> appears that the tape was able to store data.  The job associated with 
> these log entries bombed out roughly 4 hours later with an 84 error.
> 
>  
> 
> Also, during the whole backup attempt, I?d see the db_lock_media error 
> messages about every 30 seconds.  I?ve read elsewhere that this isn?t 
> really a problem.
> 
>  
> 
> 20:19:23.649 [13116] <2> select_media: skipping media id 000309, it is 
> not in correct storage unit or volume pool
> 
> 22:09:35.428 [13711] <2> check_available_drives: checking drives, about 
> to request media id 000309
> 
> 22:09:35.589 [13711] <2> select_media: selected media id 000309 for 
> backup[0], crmmnt52(rl = 4) <----------
> 
> 22:09:35.592 [13711] <2> mount_open_media: Waiting for mount of media id 
> 000309 (copy 1) on server crmmdb17.
> 
> 22:09:58.590 [13725] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:10:24.370 [13739] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:10:27.510 [13746] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:10:31.100 [13754] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:10:50.500 [13761] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:10:52.710 [13768] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:11:00.730 [13775] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:11:05.320 [13783] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:11:24.389 [13711] <2> io_open: file 
> /usr/openv/netbackup/db/media/tpreq/000309 successfully opened
> 
> 22:11:24.389 [13711] <2> write_backup: media id 000309 mounted on drive 
> index 0, drivepath /dev/rmt/8cbn, drivename LTO_crmmdb17_0, copy 1
> 
> 22:11:24.588 [13711] <2> io_position_for_write: position media id 
> 000309, copy 1, current number images = 23
> 
> 22:11:25.220 [13790] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:11:36.810 [13799] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:11:59.039 [13711] <2> io_position_for_write: empty header found on 
> 000309, OK, copy 1
> 
> 22:11:59.039 [13711] <2> io_close: closing 
> /usr/openv/netbackup/db/media/tpreq/000309, from bptm.c.17346
> 
> 22:11:59.046 [13711] <2> io_open: file 
> /usr/openv/netbackup/db/media/tpreq/000309 successfully opened
> 
> 22:12:07.410 [13807] <2> db_lock_media: unable to lock media at offset 
> 63 (000309)
> 
> 22:12:22.924 [13711] <4> write_backup: begin writing backup id 
> crmmnt52_1105157367, copy 1, fragment 1, to media id 000309 on drive index 0
> 
>  
> 
> If our problem is really network-related, are there any unusual causes I 
> should check first?  The clients we?re backing up are a mixture of 
> Solaris 7/8 and Windows XP/2000 Server/2003 Server.  I think I?ve ruled 
> out the drives altogether since I took the time to run TapeRx on all of 
> the drives in this system, and have written and read up to 4GB without 
> an error.  I?ve had backup jobs fail with error 84 that have transferred 
> less data than that when they failed.
> 
>  
> 
> Thanks for any help!
> 
>  
> 
> Marshall Skare
> 
> ATIS - Unix Engineering
> 
> (612) 277-4434
> 
>  
> 
> This message is for the designated recipient only and may contain 
> privileged, proprietary, or otherwise private information. If you have 
> received it in error, please notify the sender immediately and delete 
> the original. Any other use of the email by you is prohibited.
>