Veritas-bu

[Veritas-bu] SOLUTION: FROZEN media problems in available_media

2002-06-24 12:39:28
Subject: [Veritas-bu] SOLUTION: FROZEN media problems in available_media
From: CJManders AT LBL DOT GOV (Christopher Jay Manders)
Date: Mon, 24 Jun 2002 09:39:28 -0700
Hi,

For those interested, it appears that we found the solution. It could have
been any number of things, but here is how we fixed it so the FROZEN count
went down and new media stopped getting FROZEN.

1) We took out all FROZEN media and separated them into those with no data
count and those with.
We filed those with away in case they are called for.

2) We took out all blank media from the unit.

3) We replaced all tapes with brand new tapes (virgin media).

The problems went away.

We did note once everything was out of the robot that there had been a
cleaning tape in the robot that had a bar code label. I hypothesize now,
having had some other discussions with others offline, that this caused the
drives to each be brought to a DOWN state, and slowly the rest of the tapes
FROZE up as well that had been available.

We do have alot of media, and one of the biggest problems in seeing the
actual errors was to get past all our media appearing as DBBACKUP in the
available_media report (which already took 20 minutes to run). I had to add
a sleep 5 after each bpmedialist call in order for these to show up in the
report correctly. The change was:

                        /bin/echo "$vmediaid${TAB}$vmediatype
$vrobottype${TAB}  $vrobotnum${TAB}  $vrobotslot
${TAB} $vside${TAB} $bpretlev     $bpkbytes${TAB}$bpstatus"
>>/tmp/avail_media_output

                else
                        sleep 5







                  /usr/openv/netbackup/bin/admincmd/bpmedialist -mlist -l -h
ServBack -h GetBack -h Flas
hBack -h servback -h getback -h flashback  -ev $vmediaid 2>/dev/null |
                        while read bpmediaid bppartner bpver bpden bpalloc
bplwrite bpexp bplrest bpkbytes bpn
images bpvimages bpretlev bpunused1 bpnumrest bpstat bprest


to available media. This does cause the process to take a very long time, so
I set up a script that allows this to be run from cron, and then manually,
each leaving a file (incremented so as not to overwrite each run). Here is
that script:

#!/bin/ksh

DATE=`date +%m-%d-%Y`
OUTFILE="/opt/openv/netbackup/logs/media/available_media-$DATE.txt"
echo "Running on $DATE"

if [ -f $OUTFILE ] ;
then
   echo "File exists, so trying to add another unique label."
   OUTFILE="/opt/openv/netbackup/logs/media/available_media-$DATE-1.txt"
fi

if [ -f $OUTFILE ] ;
then
   NUM=`echo "$OUTFILE" | awk -F\- '{print $5}' | awk -F. '{print $1}'`
   let NEWNUM=${NUM}+1

OUTFILE="/opt/openv/netbackup/logs/media/available_media-$DATE-$NEWNUM.txt"
fi

`/opt/openv/netbackup/bin/goodies/available_media > $OUTFILE`

# Mail tool that sends attachements vie perl's MIME::Lite
/opt/local/bin/MailTool -t backupmaster AT lbl DOT gov -f backupmaster AT lbl 
DOT gov -s
"Servback available_media for $DATE" -m "File left in $OUTFILE" -a $OUTFILE
exit


Cheers!


Chris






>
> I noted my time was off, so this is a repost to be close to the correct
> time... Sorry if this is duplicated for anyone.
>
> Thanks again.
>
> Chris
>
>
> UPDATE:
>
> The problem is getting alot worse. We had alot of 96 errors last night.
> The
> available_media seems rather cluttered, too. More on that below...
>
> So, I found a doc by Sun Prof Support that indicates that the image
> database
> can get out of sync with the media manager database somehow.
>
> It says that if you can do a vmquery -m mediaid but not bpmedia
> -unfreeze -ev
> mediaid then this is likely the case.
>
> How do I fix this? I have an L180 and an L3500, each on a separate media
> host.
> Each has about 100 tapes in the FROZEN state.
>
> There are no hardware issues that I can find. We have scripts that
> report
> offline and down drives, and monitor /var/adm/messages with swatch
> looking for
> h/w errors and stuff.
>
> I'll just list all the quirks here to see if a bigger pattern than I can
> see
> is developing...
>
> Another caveat that is interesting is that we had alot of DBBACKUP tapes
> in
> the available_media output until I put a 'sleep 5' in front of the main
> bpimagelist command being run in there. Now we only get a couple of
> DBBACKUP
> tapes. This DBBACKUP tape  thing happened shortly after adding another
> media
> host to our NetBackup server cluster.
>
> Another 'symptom' is that we have alot of AVAILABLE tapes in the
> available_media output that have a robotic type of NONE and no robnum or
> robslotnum, but have a media type (DLT) and the barcode/media ID are
> listed.
> Why are these in here. It seems to be cluttering things up, and I wonder
> if
> there is a problem with
>
> We do also get a number of tapes that no matter how many times you
> inventory
> the robot and then in the software (or via vmupdate) the slots appear
> skewed.
> By that I mean, available_media shows a slot of 25 for a mediaid that is
> not
> really even still in the robot???? Again, we have updated the robot in
> the
> inventory.
>
> We were operating fine for a very long while (7 months, at least) doing
> exactly what we have been doing, without variance, and then suddenly
> alot of
> these 96 errors start showing up along with DBBACKUP and FROZEN tapes.
> Nothing
> appears to be able to get the FROZEN tapes to unfreeze, either.
>
> The FROZEN tapes are ALL fresh, new, tapes. But, just so you know, we
> have
> tried OLD Legato tapes and OLD Veritas tapes with the same effect. ALL
> freeze
> up after a single try in a drive.
>
> Something else that is weird is that we had a situation where a restore
> was
> calling for a tape, but the barcode label on the tape did not match at
> all the
> contents. We had assumed this was what patch 110539 fixed...as we also
> have 3
> ether drops to each box (each on a separate subnet, but round-robin DNS
> to the
> same hostname) and that was mentioned as part of the fix for that patch.
>
> I trace the problem from either near when we switched the contects of
> one
> robot (L1800) with another (L3500). that is when the DBBACKUP tapes
> started to
> show up.
>
> It was shortly thereafter that FROZEN media started, I think.
>
> So, we have 3 media hosts, one of which is the master. Servback, getback
> and
> flashback. Each has 3 network interfaces and at least 8 Diff scsi
> channels. We
> use only a few of the scsi channels, so I have a bunch extra.
>
> Here is an example of the discrepency. Note that vmquery shows the
> mediaid,
> but nothing in the bp* commands sees the media:
> # vmquery -m F00132
>
============================================================================
====
>
> media ID:              F00132
> media type:            DLT cartridge tape (11)
> barcode:               F00132
> description:           Fulls
> volume pool:           Fulls (2)
> robot type:            TLD - Tape Library DLT (8)
> robot number:          2
> robot slot:            100
> robot host:            getback
> volume group:          00_002_TLD
> created:               Mon Jun 03 14:25:40 2002
> assigned:              ---
> last mounted:          ---
> first mount:           ---
> expiration date:       ---
> number of mounts:      0
> max mounts allowed:    ---
>
============================================================================
====
>
> So, it is in the image database.
>
> But, not the NB media database:
>
> # bpexpdate -ev F00132 -d 0
> Are you SURE you want to delete F00132 y/n (n)? y
> requested media id was not found in NB media database and/or MM volume
> database
>
> OR:
>
> # bpmedia -ev F00133 -unfreeze
> requested media id was not found in NB media database and/or MM volume
> database
>
> So, I note that vmquery -pn Fulls (for example) does show all the media,
> but
> this is not carried into the NB database, so the media ids have no
> STATUS
> line.
>
>
> Any thoughts or pointers would be excellent.  I amn stumped. We have had
> no
> big issues until this...
>
>
> Thanks!
>
> Chris
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> >
> > Hi,
> >
> > We have now had a rash or problems relating to about 100 tapes becoming
> > FROZEN. There appears to be nothing to unfreeze them. Some do unfreeze,
> > only to re-freeze again during our nightly runs. Most do not unfreeze,
> > ever, giving an error.
> >
> > Here is what those with errors get:
> > # bpmedia -unfreeze -ev F00067 -h getback
> > requested media id was not found in NB media database and/or MM volume
> > database
> >
> > Note that the data is in the database in some way, though, as here is
what
> > vmquery gives on this same volume:
> > # vmquery -m F00067
> > ======================================================
> > media ID:              F00067
> > media type:            DLT cartridge tape (11)
> > barcode:               F00067
> > description:           Fulls
> > volume pool:           Fulls (2)
> > robot type:            TLD - Tape Library DLT (8)
> > robot number:          2
> > robot slot:            126
> > robot host:            getback
> > volume group:          00_002_TLD
> > created:               Tue May 21 15:27:03 2002
> > assigned:              ---
> > last mounted:          ---
> > first mount:           ---
> > expiration date:       ---
> > number of mounts:      0
> > max mounts allowed:    ---
> > =======================================================
> >
> > It appears there is some problem with the media IDs. The weird part is
that
> > ALL of the tapes just came out of shrink-wrap, so are blank and brand
new.
> >
> > What advice and knowledge is out there for this issue? Any help
> > appreciated.
> >
> > Thanks in advance!
> >
> > Chris
> >
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>


<Prev in Thread] Current Thread [Next in Thread>