[Veritas-bu] Media Error (84)

This is probably the most annoying thing, ever.

Why can't NetBackup just skip the damn tape, and move on to another
one, instead of halting the entire friggen backup? Can I configure this
somewhere?

My problem: Veritas NetBackup 4.5 (Unix/Solaris).
(Packs: NB_45_8_M NB_DMP_45_8_M NB_CLT_45_8_M NB_JAV_45_8_M)

I just put a fresh batch of SDLT tapes in the tape loader,
Quantum M1500 SDLT tape library - all brand new straight from
shrink-wrap.

Here's the relevant bit from the bptm log ...

19:43:57.449 [26527] <2> vmdb_query_scratch_bypool2: server returned:  1 
000179
------ 26 000179 -------- 8 0 dumped 00_000_TLD 13 0 0 0 0 root root 4 
TMP_OFFSITE - 1126894060 1127162982 0 0 0 0 0 0 0 - 0 0 41 0 0 0 0 0 - 0 0 
0 0 0 0 0 0 0
0 0 - 0 0 0 0 0 0 0 0 0 0 0 Added by Media Manager

19:43:57.449 [26527] <2> db_byid: search for media id 000179
19:43:57.451 [26527] <2> db_put: write media id 000179, offset = 104
19:43:57.470 [26527] <2> select_media: selected media id 000179 for 
backup[0], megastore(rl = 8) <----------
19:43:57.470 [26527] <2> mount_open_media: Waiting for mount of media id 
000179
(copy 1) on server dumped.
19:43:57.471 [26527] <2> mount_open_media: setting NDMP_TAPE_MOUNT flag 
for tpreq, SpNum field is 1
19:44:26.877 [11647] <2> bptm: INITIATING (VERBOSE = 5): -count -cmd -rt 8 
-rn 0 -stunit dumped-dlt3-robot-tld-0 -den 21 -mt 2 -masterversion 450000
19:44:26.878 [11647] <2> bptm: EXITING with status 0 <----------
19:44:27.943 [11653] <2> bptm: INITIATING (VERBOSE = 5): -count -cmd -rt 8 
-rn 0 -stunit megastore-dlt3-robot-tld-0 -den 21 -mt 3 -masterversion 
450000 -c megastore
19:44:27.944 [11653] <2> bptm: EXITING with status 0 <----------
19:46:31.939 [26527] <2> io_open: ndmp_drive_name is /ndmp/nrst0a
19:46:31.940 [26527] <2> ndmp_connect_open_and_auth: ndmp_hit_eom is 1
19:46:31.990 [26527] <2> io_open: file 
/usr/openv/netbackup/db/media/tpreq/000179 successfully opened
19:46:31.990 [26527] <2> TpGetDevice: read 2 items from regular file 
/usr/openv/netbackup/db/media/tpreq/000179, ndmp_hostname = megastore, 
devname = /ndmp/nrst0a
19:46:31.990 [26527] <2> write_backup: media id 000179 mounted on drive 
index 1, drivepath /ndmp/nrst0a, drivename MEGASTORE-SDLT320, copy 1
19:46:31.991 [26527] <2> io_read_media_header: drive index 1, reading 
media header, buflen = 32768, buff = 0x302d00, copy 1
19:46:31.991 [26527] <2> io_ioctl: command (5)MTREW 1 from (bptm.c.6356) 
on drive index 1
19:46:32.139 [26527] <2> io_read_media_header: ndmp_tape_read_func 
returned 12
19:46:32.139 [26527] <2> io_read_media_header: block read is not a 
NetBackup media header, len = 0, media id 000179, drive index 1, data is 
unknown
19:46:32.139 [26527] <2> io_write_media_header: drive index 1, writing 
media header
19:46:32.139 [26527] <2> io_close: closing 
/usr/openv/netbackup/db/media/tpreq/000179, from bptm.c.7337
19:46:32.189 [26527] <2> io_open: ndmp_drive_name is /ndmp/nrst0a
19:46:32.189 [26527] <2> ndmp_connect_open_and_auth: ndmp_hit_eom is 1
19:46:32.239 [26527] <2> io_open: file 
/usr/openv/netbackup/db/media/tpreq/000179 successfully opened
19:46:32.239 [26527] <2> io_ioctl: command (5)MTREW 1 from (bptm.c.7341) 
on drive index 1
19:46:32.549 [26527] <2> io_write_block: ndmp_tape_write_func returned 
1024
19:46:32.549 [26527] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.7370) 
on drive index 1
19:49:27.406 [11704] <2> bptm: INITIATING (VERBOSE = 5): -count -cmd -rt 8 
-rn 0 -stunit dumped-dlt3-robot-tld-0 -den 21 -mt 2 -masterversion 450000
19:49:27.407 [11704] <2> bptm: EXITING with status 0 <----------
19:49:28.475 [11710] <2> bptm: INITIATING (VERBOSE = 5): -count -cmd -rt 8 
-rn 0 -stunit megastore-dlt3-robot-tld-0 -den 21 -mt 3 -masterversion 
450000 -c megastore
19:49:28.477 [11710] <2> bptm: EXITING with status 0 <----------
19:52:20.093 [26527] <16> io_ioctl: io_ioctl_ndmp (MTWEOF) failed on media 
id 000179, drive index 1, return code -1 (?) (bptm.c.7370)
19:52:20.123 [26527] <2> log_media_error: successfully wrote to error file 
- 09/19/05 19:52:20 000179 1 WRITE_ERROR
19:52:20.123 [26527] <2> check_error_history: called from bptm line 16964, 
EXIT_Status = 84
19:52:20.174 [26527] <2> check_error_history: drive index = 1, media id = 
000179, time = 09/19/05 19:52:20, both_match = 0, media_match = 0, 
drive_match = 0
19:52:20.174 [26527] <2> io_close: closing 
/usr/openv/netbackup/db/media/tpreq/000179, from bptm.c.13181
19:52:20.221 [26527] <2> tpunmount: tpunmount'ing 
/usr/openv/netbackup/db/media/tpreq/000179


I guess the relevant bit out of that is:

19:52:20.093 [26527] <16> io_ioctl: io_ioctl_ndmp (MTWEOF) failed on media
id 000179, drive index 1, return code -1 (?) (bptm.c.7370)

My differential, shorter backups have no issues, ever. It seems that
my longer backups are having problems. It's usually the 2nd or 3rd tape
that's mounted that mysteriously has a 'media error'. I'm not sure
exactly what to troubleshoot or look at at this point. NDMP issues?
Problems with my NetApp? Something with the tape drive? Something with
my Solaris server?

"SunOS dumped 5.9 Generic_112233-08 sun4u sparc SUNW,Sun-Fire-V210"
"NetApp Release 6.5.5: Wed Apr 27 04:39:40 PDT 2005"

Any pointers or things to look at would be great,

Thanks,


-- 
It's always September somewhere on the 'net. | http://angui.sh
Another proud member of Eep's killfile.      | Unix Sys. Admin.
All projects approach the ghetto, some       |
faster than others.                          | matt AT angui DOT sh