Veritas-bu

[Veritas-bu] Drives keep going down

2002-01-06 02:24:59
Subject: [Veritas-bu] Drives keep going down
From: nhunt AT lehman DOT com (Noel Lindsay Hunt)
Date: Sun, 06 Jan 2002 16:24:59 +0900
In the light of the dearth of replies to my question
about media timeouts, I thought I'd add some detail. I
have now confirmed that NetBackup is using a tape on the
media server (single DLT drive), successfully written an
image and when the next backup starts immediately after
this it can't find media on-line:

        ---
23:09:52 [25367] <2> bptm: INITIATING: -w -c quux -den 15 -rt 0 -rn -1 -stunit
quahog-dlt2 -cl U006 -bt 1009894188 -b quux_1009894188 -st 1 -cj 1 -p Standalone
-ru root -rclnt quux -rclnthostname quux -rl 2 -rp 1814400 -sl incrbkps -ct 0
-tir -v -mediasvr quux -jobid 3083 -jobgrpid 3083 -masterversion 340000 -shm
        . . .
23:09:52 [25367] <2> select_media: media id SA0003 already mounted
23:09:52 [25367] <2> select_media: selected media id SA0003 for backup,
quux(rl = 2) <----------
23:09:52 [25367] <2> io_open: file /usr/openv/netbackup/db/media/tpreq/SA0003
successfully opened
23:09:52 [25367] <2> write_backup: media id SA0003 mounted on drive index 0,
drivepath /dev/rmt/0cbn, drivename SUNDLT40000
        . . .
23:10:17 [25391] <2> bptm: INITIATING: -U
23:10:17 [25391] <2> db_byid: search for media id SA0003
23:10:17 [25391] <2> db_byid: SA0003 found at offset 2
23:10:17 [25391] <2> tpunmount_all: tpunmount'ing /usr/openv/netbackup/db/media
/tpreq/SA0003
23:10:17 [25391] <2> bptm: EXITING with status 0 <----------
        . . .
23:10:19 [25397] <2> bptm: INITIATING: -w -c quux -den 15 -rt 0 -rn -1 -stunit
quahog-dlt2 -cl U006 -bt 1009894216 -b quux_1009894216 -st 1 -cj 1 -p Standalone
-ru root -rclnt quux -rclnthostname quux -rl 2 -rp 1814400 -sl incrbkps -ct 0
-tir -v -mediasvr quux -jobid 3084 -jobgrpid 3083 -masterversion 340000 -shm
        . . .
23:10:20 [25397] <2> select_media: cannot find mounted media in standalone
drive, select media from media database if possible

At this point NetBackup waits for some appropriate volume
in the pool to be mounted then times out:

23:10:20 [25397] <2> select_media: selected media id SA0001 for backup,
quux(rl = 2) <----------
23:10:20 [25397] <2> mount_open_media: Waiting for mount of media id SA0001 on
server quux.
        . . .
23:25:19 [25397] <2> mount_open_media: mount canceled detected in tpreq(),
signo = 1
23:25:20 [25397] <16> mount_open_media: media manager terminated during mount
of media id SA0001, possible media mount timeout
23:25:22 [25397] <16> catch_signal: media manager terminated by parent process
23:25:22 [25397] <2> catch_signal: EXITING with status 82
        ---

Shortly after, netbackup attempts a backup again and this time
finds the tape online:

        ---
23:27:33 [26155] <2> bptm: INITIATING: -w -c quux -den 15 -rt 0 -rn -1 -stunit
quahog-dlt2 -cl U006 -bt 1009895249 -b quux_1009895249 -st 1 -cj 1 -p Standalone
-ru root -rclnt quux -rclnthostname quux -rl 2 -rp 1814400 -sl incrbkps -ct 0
-tir -v -mediasvr quux -jobid 3085 -jobgrpid 3083 -masterversion 340000 -shm
        . . .
23:27:34 [26155] <2> select_media: added 4 media id's to list that matched
robot number/type and media type
23:27:34 [26155] <2> select_media: consider allowing retention level 2 on
media (SA0001) that is 3 since its enabled by bp.conf file
23:27:34 [26155] <2> select_media: consider allowing retention level 2 on
media (SA0003) that is 3 since its enabled by bp.conf file
23:27:34 [26155] <2> standalone_select_media: found RVSN SA0003 in device 0
        ---

I have no idea what could be causing this. It is clearly
not a case of the tape being in a FULL, FROZEN or
SUSPENDED state, nor is it a problem of multiple
retentions; ALLOW_MULTIPLE_RETENTIONS_PER_MEDIA is set as
is clear from the above messages.

I thought at first there might be some interference from
some process trying to access the drive. This may be true
but it's happening on another media server as well. It
seems to be random.  It looks more like some timing
problem where the tape is not rewound when NetBackup
tries to read the label, or something like this. Does
anyone have any experience of this?

<Prev in Thread] Current Thread [Next in Thread>