Veritas-bu

[Veritas-bu] RE: Veritas-bu digest, Vol 1 #1990 - 3 msgs

2003-03-03 14:52:53
Subject: [Veritas-bu] RE: Veritas-bu digest, Vol 1 #1990 - 3 msgs
From: Bleimeyer, Paul W." <paulb AT mayo DOT edu (Paul Bleimeyer)
Date: Mon, 3 Mar 2003 13:52:53 -0600
> From: "vidit kohli" <vidit_k AT hotmail DOT com>
> Subject: [Veritas-bu] 9840 drive cleaning in L700

> I have setup Netbackup 3.4 to control 9840 drive cleaning in
> L700, but
> problem is that it don't work as required.
> [my understanding was that when ever any drive needs cleaning
> , netbackup
> will perform before next backup starts for that 9840 drive]
>
> I have two cleaning media in the slots detected by media manager,
> also tpclean -c option does manual clean fine
>
> Please advise if I'm missing any settings?
>


Vidit,

I am going to reply here since I faced the same thing as you about a year
ago
and this might help someone else out as well if they read the archives later
on.

You mentioned in an email to me that you had SSO enabled on your media and
master
servers. The reason I asked this is because we did the same thing. Same
problems too!

In an SSO environment with an L700 you want the library to clean the tape
drives and not have veritas handle it. The reason is that frequency base
tape cleaning and tape alert fails under environments using SSO with
veritas. The library should be made responsible for cleaning the drives.
This is not totally clear in the Veritas documentation when referencing SSO
environments. At least the last time I looked it was a little sparse.

So what the heck is going on then? Why does it fail under SSO and not
others?
Here is why. It has to do with the tape cleaning and usage counter getting
reset.
When you first media server mounts the tape and starts using the drive, then
the counter on usage in the drive starts running. For example, you run about
10 tapes through this same drive via the
same media server all weekend long during a full. The last tape unmounts and
the veritas
media server notifies your master that it is done with the drive and the
drive is released.
At this point your tape drive has a usage counter for the amount of time
this drive
has been in use. Now veritas is currently setup to detect this and will
attempt to clean
the drives. Everything looks great right?

Here is where SSO gets in the mix. Now along comes media server number 2.

It has a number of policies/classes it is supposed to run as
well and starts allocating drives. Including the one your first media server
just got done
using. As soon as it connects to the tape drive that media server 1 was just
using, it
resets the tape usage/cleaning counter on the drive and the veritas job for
cleaning
never fires. Not good.

So how do we fix this behavior? With SSO in the mix you want to make sure
that the library itself handles this process for you. Now this gets a little
bit interesting as well, since some of the prior releases of firmware on the
L700 didn't always handle this properly as well.

Here is a link from support about the version issues:
http://seer.support.veritas.com/docs/240940.htm

So once you get the veritas jobs disabled, now you want to go to the front
panel of your l700 or
you can do this remotely if you have the horizon code installed for remote
support as well.
You want to select the menu button and then find the main library portion
and enable library
cleaning. This way when the "clean drive" light comes on, your library will
pull a cleaning tape
and insert it into the drive for cleaning at the next available interval.
You also want to keep
at least a couple tapes in your normal silo space for running a manual
cleaning at times.
Every once in a while you may get a tape that ends up being frozen and
veritas decides to freeze
it since it looks like a bad tape. Go ahead and manually clean the drive and
reload the tape again
with it unfrozen. In some cases, this is just a drive that is very close to
needing a cleaning
and has not tripped the clean drive light. Just a suggestion.

I can tell you that we are running with horizon load level 3.01.02 and
130.112E on our drives
and its working like a champ. I would also recommend you consider upgrading
to the 341_4A patch
if you can, since it also has some corrections in it for releasing the
drives properly. Just
a suggestion. As always, check your fabric rev levels, tape drivers, drive
firmware, library firmware,
and switch rev levels into account before changing anything in your SAN.

Here is the readme on 4a.
http://seer.support.veritas.com/docs/240940.htm


Hope this helps you out.

Regards,

Paul Bleimeyer
Research Computing Facility
Mayo Foundation, Rochester MN