Veritas-bu

[Veritas-bu] Drives keep going down

2002-01-06 04:18:02
Subject: [Veritas-bu] Drives keep going down
From: bhalchandra.kelkar AT csfb DOT com (Kelkar, Bhalchandra)
Date: Sun, 6 Jan 2002 17:18:02 +0800
I guess you can check the Global attribute  Media Mount Timeout to ensure it
is a sufficiently large value.

- cheers

-----Original Message-----
From: Noel Lindsay Hunt [mailto:nhunt AT lehman DOT com]
Sent: Sunday, January 06, 2002 3:25 PM
To: Veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Drives keep going down


In the light of the dearth of replies to my question
about media timeouts, I thought I'd add some detail. I
have now confirmed that NetBackup is using a tape on the
media server (single DLT drive), successfully written an
image and when the next backup starts immediately after
this it can't find media on-line:

        ---
23:09:52 [25367] <2> bptm: INITIATING: -w -c quux -den 15 -rt 0 -rn -1
-stunit
quahog-dlt2 -cl U006 -bt 1009894188 -b quux_1009894188 -st 1 -cj 1 -p
Standalone
-ru root -rclnt quux -rclnthostname quux -rl 2 -rp 1814400 -sl incrbkps -ct
0
-tir -v -mediasvr quux -jobid 3083 -jobgrpid 3083 -masterversion 340000 -shm
        . . .
23:09:52 [25367] <2> select_media: media id SA0003 already mounted
23:09:52 [25367] <2> select_media: selected media id SA0003 for backup,
quux(rl = 2) <----------
23:09:52 [25367] <2> io_open: file
/usr/openv/netbackup/db/media/tpreq/SA0003
successfully opened
23:09:52 [25367] <2> write_backup: media id SA0003 mounted on drive index 0,
drivepath /dev/rmt/0cbn, drivename SUNDLT40000
        . . .
23:10:17 [25391] <2> bptm: INITIATING: -U
23:10:17 [25391] <2> db_byid: search for media id SA0003
23:10:17 [25391] <2> db_byid: SA0003 found at offset 2
23:10:17 [25391] <2> tpunmount_all: tpunmount'ing
/usr/openv/netbackup/db/media
/tpreq/SA0003
23:10:17 [25391] <2> bptm: EXITING with status 0 <----------
        . . .
23:10:19 [25397] <2> bptm: INITIATING: -w -c quux -den 15 -rt 0 -rn -1
-stunit
quahog-dlt2 -cl U006 -bt 1009894216 -b quux_1009894216 -st 1 -cj 1 -p
Standalone
-ru root -rclnt quux -rclnthostname quux -rl 2 -rp 1814400 -sl incrbkps -ct
0
-tir -v -mediasvr quux -jobid 3084 -jobgrpid 3083 -masterversion 340000 -shm
        . . .
23:10:20 [25397] <2> select_media: cannot find mounted media in standalone
drive, select media from media database if possible

At this point NetBackup waits for some appropriate volume
in the pool to be mounted then times out:

23:10:20 [25397] <2> select_media: selected media id SA0001 for backup,
quux(rl = 2) <----------
23:10:20 [25397] <2> mount_open_media: Waiting for mount of media id SA0001
on
server quux.
        . . .
23:25:19 [25397] <2> mount_open_media: mount canceled detected in tpreq(),
signo = 1
23:25:20 [25397] <16> mount_open_media: media manager terminated during
mount
of media id SA0001, possible media mount timeout
23:25:22 [25397] <16> catch_signal: media manager terminated by parent
process
23:25:22 [25397] <2> catch_signal: EXITING with status 82
        ---

Shortly after, netbackup attempts a backup again and this time
finds the tape online:

        ---
23:27:33 [26155] <2> bptm: INITIATING: -w -c quux -den 15 -rt 0 -rn -1
-stunit
quahog-dlt2 -cl U006 -bt 1009895249 -b quux_1009895249 -st 1 -cj 1 -p
Standalone
-ru root -rclnt quux -rclnthostname quux -rl 2 -rp 1814400 -sl incrbkps -ct
0
-tir -v -mediasvr quux -jobid 3085 -jobgrpid 3083 -masterversion 340000 -shm
        . . .
23:27:34 [26155] <2> select_media: added 4 media id's to list that matched
robot number/type and media type
23:27:34 [26155] <2> select_media: consider allowing retention level 2 on
media (SA0001) that is 3 since its enabled by bp.conf file
23:27:34 [26155] <2> select_media: consider allowing retention level 2 on
media (SA0003) that is 3 since its enabled by bp.conf file
23:27:34 [26155] <2> standalone_select_media: found RVSN SA0003 in device 0
        ---

I have no idea what could be causing this. It is clearly
not a case of the tape being in a FULL, FROZEN or
SUSPENDED state, nor is it a problem of multiple
retentions; ALLOW_MULTIPLE_RETENTIONS_PER_MEDIA is set as
is clear from the above messages.

I thought at first there might be some interference from
some process trying to access the drive. This may be true
but it's happening on another media server as well. It
seems to be random.  It looks more like some timing
problem where the tape is not rewound when NetBackup
tries to read the label, or something like this. Does
anyone have any experience of this?
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

This message is for the named person's use only. It may contain sensitive and 
private proprietary or legally privileged information. No confidentiality or 
privilege is waived or lost by any mistransmission. If you are not the intended 
recipient, please immediately delete it and all copies of it from your system, 
destroy any hard copies of it and notify the sender. You must not, directly or 
indirectly, use, disclose, distribute, print, or copy any part of this message 
if you are not the intended recipient. CREDIT SUISSE GROUP and each legal 
entity in the CREDIT SUISSE FIRST BOSTON or CREDIT SUISSE ASSET MANAGEMENT 
business units of CREDIT SUISSE FIRST BOSTON reserve the right to monitor all 
e-mail communications through its networks. Any views expressed in this message 
are those of the individual sender, except where the message states otherwise 
and the sender is authorized to state them to be the views of any such entity.
Unless otherwise stated, any pricing information given in this message is 
indicative only, is subject to change and does not constitute an offer to deal 
at any price quoted. Any reference to the terms of executed transactions should 
be treated as  preliminary only and subject to our formal written confirmation.



<Prev in Thread] Current Thread [Next in Thread>