Good Morning Rafal
The first thing that you have to do is find the reason for the error:
08:16:21.672 [1443] <2> write_data: absolute block position prior to writing
backup header(s) is 389156, copy 1
08:16:21.672 [1443] <2> write_data: block position check: actual 389156,
expected 504267
08:16:21.695 [1443] <16> write_data: FREEZING media id 000140, External event
caused rewind during write, all data on media is lost
Some other process or program that has access to the tape drive has moved
and/or writen to the tape.
So you see the block number showing the location on the tape with a smaller
value then the value expected. When writting they should only increase.
Are you sharing the tape drives using netbackup's sso option. A problem with
the sso setup, is one method of seeing this problem. Just think of a media
server A writting on a tape and media server B issuing a rewind. After the
rewind, the next write by media server A to the tape will overwrite the tape
label.
Let us know what you find.
len
-----Original Message-----
From: rafal wiosna [mailto:rafamiga AT uucp.polbox DOT pl]
Sent: Sun 5/25/2003 5:37 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Cc:
Subject: [Veritas-bu] NB4.5 Datacenter bad? tape problem + L700 cleaning
This morning I've discovered a third frozen tape in my L700.
That's
strange, the library sits in a comfortable place, nobody is touching, it
works on NB autopilot. Bad tapes?...
This FROZEN tape I got I can read the images from. When greping
logs
I got this:
08:16:21.665 [1443] <4> write_backup: begin writing backup id
tdcf03_1053839725, copy 1, fragment 2, to media id 000140 on drive index 2
08:16:21.666 [1443] <2> signal_parent: sending SIGUSR1 to bpbrm (pid =
1439)
08:16:21.666 [1443] <2> write_data: twin_index: 0 active: 1
dont_process: 0 wrote_backup_hdr: 0 finished_buff: 0 saved_cindex: 0
08:16:21.672 [1443] <2> write_data: absolute block position prior to
writing backup header(s) is 389156, copy 1
08:16:21.672 [1443] <2> write_data: block position check: actual
389156, expected 504267
08:16:21.695 [1443] <16> write_data: FREEZING media id 000140, External
event caused rewind during write, all data on media is lost
08:16:21.731 [1443] <2> log_media_error: successfully wrote to error
file - 05/25/03 08:16:21 000140 2 WRITE_ERROR
08:16:21.743 [1443] <2> check_error_history: called from bptm line
14685, EXIT_Status = 84
08:16:21.752 [1443] <2> check_error_history: drive index = 2, media id
= 000140, time = 05/25/03 08:16:21, both_match = 0, media_match = 0,
drive_match = 0
08:16:21.752 [1443] <2> io_close: closing
/usr/openv/netbackup/db/media/tpreq/000140, from bptm.c.12914
08:16:21.753 [1443] <2> tpunmount: tpunmount'ing
/usr/openv/netbackup/db/media/tpreq/000140
08:16:21.754 [1443] <2> TpUnmountWrapper: SCSI RELEASE
What external event at 8am on sunday?!? Good thing I belive I
can
unfreeze the tape and get some contents out of it anyway. I will
unfreeze
the tape and see what happens.
But with another tape [frozen earlier] the matter is worse.
When I
try to read the catalog off the media I get:
10:31:34.879 [9424] <2> io_open: SCSI RESERVE
10:31:34.936 [9424] <2> io_open: file
/usr/openv/netbackup/db/media/tpreq/000028 successfully opened
10:31:34.937 [9424] <2> load_media: media id 000028 mounted on drive
index 1
10:31:34.965 [9424] <2> io_read_media_header: drive index 1, reading
media header, buflen = 32768, buff = 0x22aaa0, copy 1
10:31:35.222 [9424] <2> io_ioctl: command (5)MTREW 1 from (bptm.c.6182)
on drive index 1
10:31:46.129 [9424] <16> io_read_media_header: tapemark or blank tape
encountered reading media header, drive index 1
10:31:46.474 [9424] <2> check_error_history: called from bptm line
2724, EXIT_Status = 85
10:31:46.483 [9424] <2> check_error_history: drive index = 1, media id
= 000028, time = 05/25/03 10:31:46, both_match = 0, media_match = 0,
drive_match = 0
10:31:46.483 [9424] <2> io_close: closing
/usr/openv/netbackup/db/media/tpreq/000028, from bptm.c.12914
10:31:46.483 [9424] <2> tpunmount: tpunmount'ing
/usr/openv/netbackup/db/media/tpreq/000028
10:31:46.485 [9424] <2> TpUnmountWrapper: SCSI RELEASE
10:31:46.544 [9424] <2> bptm: EXITING with status 85 <----------
Good thing it consists of 8 GB [out of 100 GB] of images
expiring
June 2nd so it's not so relevant to the data consistency in my backup
library. I can live without it. But the question is this: how can I
remove
this tape from the library and re-use it? With the previous tape I had
problem with I tried to expire the tape but in the frozen state it was
impossible [or was I doint something in wrong order]. Only when I got
the
replacement tape [since it was first mount that caused the troubles I
managed to replace the tape for free] I was able to re-claim the tape
as a
clean one, with no contents.
Any suggestions what should I do now? This tape is like 3 months
old. I know, this is magnetic media and problems can happens so is
there any
way to check the tape [don't care about erasing the contents] and/or
re-use
it?
Also for cleaning, I belive my Storagetek L700 is managing the
cleaning process. It has autoclean feature on. Okay, but the cleaning
tapes
were never used for 8 months. I wish I can clean the drives once a 3-4
months JUST IN CASE to prevent an error and not wait for the first
error to
cause cleaning. The errorcount on either of the 3 drives is 0 -- and
that's
probably why the cleaning never happened. And I wish the L700 won't
place
cleaning in the place NB cannot reach them [it does not see it in the
robot].
--
__________________________________________________________________________
rafal wiosna * Å?wiat Internet S.A. * Polbox * In ARP we trust * AR164
V6TB
RAFD-RIPE * Dom som surfar pĺ nättet, fĺr leka med musen * PGPID:
3CDCB7A9
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|