[Veritas-bu] restore failure.

Hello.

I've having trouble following this log, can someone please give me some
ideas?

12:34:40 [13937] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:35:00 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:35:11 [13937] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:35:35 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:35:42 [13937] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:36:05 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:36:36 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:37:07 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:37:38 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:37:58 [13937] <2> send_brm_msg: MEDIA NOT READY
12:37:58 [13937] <2> mpx_read_data: attempting read error recovery, err = 5
12:37:58 [13937] <2> tape_error_rec: error recovery to block 23609922
requested
12:37:58 [13937] <2> tape_error_rec: attempting error recovery, delay 3
minutes before next attempt, tries left = 5
12:38:09 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:38:39 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:39:12 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:39:42 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:40:12 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:40:42 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:40:58 [13937] <2> tape_error_rec: absolute block position after error is
0
12:40:58 [13937] <2> getsockconnected: host=phido service=bpdbm
address=10.100.1.247 protocol=tcp non-reserved port=13721
12:40:58 [13937] <2> bind_on_port_addr: bound to port 37004
12:40:58 [13937] <2> check_authentication: no authentication required
12:40:58 [13937] <16> mpx_read_data: cannot read image from media id 000023,
drive index 1, I/O error
12:40:58 [13937] <2> log_media_error: successfully wrote to error file -
10/24/01 12:40:58 000023 1 READ_ERROR
12:40:58 [13937] <4> notify: executing -
/usr/openv/netbackup/bin/restore_notify bptm far_1003907531 restore
12:40:58 [13937] <2> check_error_history: called from bptm line 849,
EXIT_Status = 85
12:40:58 [13937] <2> check_error_history: drive index = 1, media id =
000023, time = 10/24/01 12:40:58, both_match = 0, media_match = 0,
drive_match = 0
12:40:58 [13937] <2> io_close: closing
/usr/openv/netbackup/db/media/tpreq/000023, from bptm.c.10708
12:40:58 [13937] <2> tpunmount: tpunmount'ing
/usr/openv/netbackup/db/media/tpreq/000023
12:41:02 [13937] <2> send_brm_msg: ERROR 85
12:41:02 [13937] <2> mpx_terminate_exit: EXITING with status 85
12:41:02 [13937] <2> cleanup: removing shared memory for mpx, shmid = 50698
12:41:12 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:41:43 [13907] <2> mpx_read_data: sent keepalive signal to bpbrm, reset
count to 610
12:41:51 [14209] <2> bptm: INITIATING: -count -cmd -rt 6 -rn 0 -stunit
ADIC_AIT -den 16 -mt 2 -masterversion 340000
12:41:51 [14209] <2> bptm: EXITING with status 0 <----------
12:41:52 [14217] <2> bptm: INITIATING: -delete_expired
12:41:52 [14217] <2> bptm: EXITING with status 0 <----------
12:42:01 [13907] <4> notify: executing -
/usr/openv/netbackup/bin/restore_notify bptm far_1003907530 restore
12:42:01 [13907] <2> getsockconnected: host=phido service=bpdbm
address=10.100.1.247 protocol=tcp non-reserved port=13721
12:42:01 [13907] <2> bind_on_port_addr: bound to port 37013
12:42:01 [13907] <2> check_authentication: no authentication required
12:42:01 [13907] <4> mpx_check_children: successfully restored from backup
id far_1003907530, copy 1, 18552832 kbytes
12:42:01 [13907] <2> send_brm_msg: MEDIA NOT READY
12:42:01 [13907] <2> send_brm_msg: EXIT far_1003907530 0
12:42:01 [13907] <2> bct_clear_active: Set BCTable entry 0 to inactive,
ACTIVE_GC count decremented to 0
12:42:01 [13907] <2> mpx_read_data: waited for empty buffer 6186 times,
delayed 41042 times
12:42:01 [13907] <2> io_close: closing
/usr/openv/netbackup/db/media/tpreq/000024, from mpxrestore.c.2250
12:42:01 [13907] <2> mpx_waiting_term: waiting for TERMINATE or another
START RESTORE
12:42:01 [13907] <2> read_brm_msg: STOP RESTORE far_1003907530
12:42:01 [13907] <2> process_brm_msg: could not find backupid far_1003907530
to terminate
12:42:02 [13907] <2> read_brm_msg: TERMINATE
12:42:02 [13907] <2> getsockconnected: host=phido service=bpdbm
address=10.100.1.247 protocol=tcp non-reserved port=13721
12:42:02 [13907] <2> bind_on_port_addr: bound to port 37014
12:42:02 [13907] <2> check_authentication: no authentication required
12:42:02 [13907] <4> mpx_read_backup: successfully restored 1 of 1 requests,
read total of 18552832 Kbytes at 6388.822 Kbytes/sec
12:42:02 [13907] <2> read_backup_unmount_delay: waiting 180 seconds before
unmounting media after restore
12:45:02 [13907] <2> read_backup_unmount_delay: unmounting
/usr/openv/netbackup/db/media/tpreq/000024 after delay
12:45:03 [13907] <2> bptm: EXITING with status 0 <----------


I'm not sure what exit status 85 is, but it isn't good.  When this happens,
I see the following in the syslog:

Oct 24 12:37:58 phido scsi: [ID 365881 kern.info] /pci@1f,2000/scsi@1
(glm4):
Oct 24 12:37:58 phido   Cmd (0x2586c08) dump for Target 2 Lun 0:
Oct 24 12:37:58 phido scsi: [ID 365881 kern.info] /pci@1f,2000/scsi@1
(glm4):
Oct 24 12:37:58 phido           cdb=[ 0x8 0x1 0x0 0x0 0x40 0x0 ]
Oct 24 12:37:58 phido scsi: [ID 365881 kern.info] /pci@1f,2000/scsi@1
(glm4):
Oct 24 12:37:58 phido   pkt_flags=0x0 pkt_statistics=0x61 pkt_state=0x7
Oct 24 12:37:58 phido scsi: [ID 365881 kern.info] /pci@1f,2000/scsi@1
(glm4):
Oct 24 12:37:58 phido   pkt_scbp=0x0 cmd_flags=0x8e1
Oct 24 12:37:58 phido scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,2000/scsi@1 (glm4):
Oct 24 12:37:58 phido   Disconnected command timeout for Target 2.0
Oct 24 12:37:58 phido glm: [ID 160360 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6016]
Oct 24 12:37:58 phido scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,2000/scsi@1/st@2,0 (st30):
Oct 24 12:37:58 phido   SCSI transport failed: reason 'timeout': giving up
Oct 24 12:37:58 phido


I'm not sure if this could indicate a failure in the SCSI Diff card, or more
likely a bad tape?  A bad tape could cause a timeout.  Any ideas would be
great.

Thanx!!

benr AT cuddletech DOT com