Veritas-bu

[Veritas-bu] Update and more information: FROZEN media problems in available_media

2002-06-04 09:25:17
Subject: [Veritas-bu] Update and more information: FROZEN media problems in available_media
From: CJManders AT lbl DOT gov (Christopher Jay Manders)
Date: Tue, 04 Jun 2002 06:25:17 -0700
UPDATE:

The problem is getting alot worse. We had alot of 96 errors last night. The
available_media seems rather cluttered, too. More on that below...

So, I found a doc by Sun Prof Support that indicates that the image database
can get out of sync with the media manager database somehow.

It says that if you can do a vmquery -m mediaid but not bpmedia -unfreeze -ev
mediaid then this is likely the case.

How do I fix this? I have an L180 and an L3500, each on a separate media host.
Each has about 100 tapes in the FROZEN state.

There are no hardware issues that I can find. We have scripts that report
offline and down drives, and monitor /var/adm/messages with swatch looking for
h/w errors and stuff.

I'll just list all the quirks here to see if a bigger pattern than I can see
is developing...

Another caveat that is interesting is that we had alot of DBBACKUP tapes in
the available_media output until I put a 'sleep 5' in front of the main
bpimagelist command being run in there. Now we only get a couple of DBBACKUP
tapes. This DBBACKUP tape  thing happened shortly after adding another media
host to our NetBackup server cluster.

Another 'symptom' is that we have alot of AVAILABLE tapes in the
available_media output that have a robotic type of NONE and no robnum or
robslotnum, but have a media type (DLT) and the barcode/media ID are listed.
Why are these in here. It seems to be cluttering things up, and I wonder if
there is a problem with

We do also get a number of tapes that no matter how many times you inventory
the robot and then in the software (or via vmupdate) the slots appear skewed.
By that I mean, available_media shows a slot of 25 for a mediaid that is not
really even still in the robot???? Again, we have updated the robot in the
inventory.

We were operating fine for a very long while (7 months, at least) doing
exactly what we have been doing, without variance, and then suddenly alot of
these 96 errors start showing up along with DBBACKUP and FROZEN tapes. Nothing
appears to be able to get the FROZEN tapes to unfreeze, either.

The FROZEN tapes are ALL fresh, new, tapes. But, just so you know, we have
tried OLD Legato tapes and OLD Veritas tapes with the same effect. ALL freeze
up after a single try in a drive.

Something else that is weird is that we had a situation where a restore was
calling for a tape, but the barcode label on the tape did not match at all the
contents. We had assumed this was what patch 110539 fixed...as we also have 3
ether drops to each box (each on a separate subnet, but round-robin DNS to the
same hostname) and that was mentioned as part of the fix for that patch.

I trace the problem from either near when we switched the contects of one
robot (L1800) with another (L3500). that is when the DBBACKUP tapes started to
show up.

It was shortly thereafter that FROZEN media started, I think.

So, we have 3 media hosts, one of which is the master. Servback, getback and
flashback. Each has 3 network interfaces and at least 8 Diff scsi channels. We
use only a few of the scsi channels, so I have a bunch extra.

Here is an example of the discrepency. Note that vmquery shows the mediaid,
but nothing in the bp* commands sees the media:
# vmquery -m F00132
================================================================================

media ID:              F00132
media type:            DLT cartridge tape (11)
barcode:               F00132
description:           Fulls
volume pool:           Fulls (2)
robot type:            TLD - Tape Library DLT (8)
robot number:          2
robot slot:            100
robot host:            getback
volume group:          00_002_TLD
created:               Mon Jun 03 14:25:40 2002
assigned:              ---
last mounted:          ---
first mount:           ---
expiration date:       ---
number of mounts:      0
max mounts allowed:    ---
================================================================================

So, it is in the image database.

But, not the NB media database:

# bpexpdate -ev F00132 -d 0
Are you SURE you want to delete F00132 y/n (n)? y
requested media id was not found in NB media database and/or MM volume
database

OR:

# bpmedia -ev F00133 -unfreeze
requested media id was not found in NB media database and/or MM volume
database

So, I note that vmquery -pn Fulls (for example) does show all the media, but
this is not carried into the NB database, so the media ids have no STATUS
line.


Any thoughts or pointers would be excellent.  I amn stumped. We have had no
big issues until this...


Thanks!

Chris


















>
> Hi,
>
> We have now had a rash or problems relating to about 100 tapes becoming
> FROZEN. There appears to be nothing to unfreeze them. Some do unfreeze,
> only to re-freeze again during our nightly runs. Most do not unfreeze,
> ever, giving an error.
>
> Here is what those with errors get:
> # bpmedia -unfreeze -ev F00067 -h getback
> requested media id was not found in NB media database and/or MM volume
> database
>
> Note that the data is in the database in some way, though, as here is what
> vmquery gives on this same volume:
> # vmquery -m F00067
> ======================================================
> media ID:              F00067
> media type:            DLT cartridge tape (11)
> barcode:               F00067
> description:           Fulls
> volume pool:           Fulls (2)
> robot type:            TLD - Tape Library DLT (8)
> robot number:          2
> robot slot:            126
> robot host:            getback
> volume group:          00_002_TLD
> created:               Tue May 21 15:27:03 2002
> assigned:              ---
> last mounted:          ---
> first mount:           ---
> expiration date:       ---
> number of mounts:      0
> max mounts allowed:    ---
> =======================================================
>
> It appears there is some problem with the media IDs. The weird part is that
> ALL of the tapes just came out of shrink-wrap, so are blank and brand new.
>
> What advice and knowledge is out there for this issue? Any help
> appreciated.
>
> Thanks in advance!
>
> Chris
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu