Hi Len,
Bad tapes VS bad drives is always an interesting discussion. Drives,
like media, age and fail. Often, we see an error with a drive and a piece of
media that really isn't either one being broken, but just something bad
happened that time they tried to work together.
As an example, Take a job that is running and there is a pause in the
data transfer (network bandwidth) and the drive has to reposition and fails
that function. This is not a sure sign that either the drive or the tape is
bad. It could just be a passing failure. Re-run the job and everything is fine.
What we do here is suggest that customer's monitor failures closely and
investigate the errors they see. If a tape seems to fail a lot across multiple
drives, then get rid of it. If a drive fails with lots of different pieces of
media, then call your HW vendor and get it swapped.
One thing I can say is that not doing this kind of tracking can cause
enormous problems later on when multiple bad tapes are in the library and there
may be one or two drives that are also failing. At this point, things have spun
out of control and finding the bad tapes/drives is quite a chore.
Keeping a log of what tapes fail in what drives over time is the best
defense against finding yourself backed up against the wall with multiple bad
items in the environment.
I find this much easier than testing a suspect tape because it is
proactive instead of reactive. It also gives you the opportunity to set up
proposed testing scenarios. If the same job uses the same tape and the same
drive every night and fails every night, there is no way to tell based on that
info if it is the tape or the drive. If you are monitoring and figure that out,
then you can move the job to a different job with the same tape, or remove the
tape and use the same drive.
A standard troubleshooting technique is to change one thing and see if
the system behaves differently (read; fixes the problem). If it does, you
removed the bad component, if it doesn't, then you probably still have the bad
component in the system.
Testing a tape requires writing from end to end. That takes a lot of
time and effort. Proactively managing your environment and monitoring failures
requires much less effort and also gives you better insight in how to improve
your backups because it can show you other things that could use some tweaking.
I've backside sent you a doc on Troubleshooting Tape drives in a UNIX
environment. It's not comprehensive, but it can give you some ideas on how to
isolate a failing component in your system. If anyone else in the group is
interested, they can send me an email request and I'll forward it along.
Mark Pinder : Systems Engineer:
Spectra Logic : www.spectralogic.com
-----Original Message-----
From: len boyle [mailto:len.boyle AT sas DOT com]
Sent: Sunday, November 06, 2005 9:35 PM
To: Mark Pinder; veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] RE: Number of mounts before getting rid of a tape?
Hello Mark and other,
This is an interesting topic,
In the past I had thinking that our dlt tape carts would last about 200-300
mounts before getting into trouble. Most tapes did not do this, but the pool
of tapes used for netbackup catalog tapes would reach these numbers without
too much trouble. We also saw this with tapes used for rman archive log
backups when the dba's had the archive logs set to small quick jobs.
Of course it is always hard to determine if a problem is with the tape cart,
tape drive, software(includes driver) or all three. After all if a tape
drive is bad, it is not a good source for knowing if a tape is bad or not.
With the lto-2 tape carts, someone on this list posted a ref to a fuji tape
site which listed many 1000's of mounts. So it would seem that if they are
correct, it would almost never be the tape.
And our tape library ce would almost always think that it was the tape and
not the drives.
Has anyone know of tools (unix or windows) other then tcopy that is useful
for testing tapes?
Something like the fdr product for the ex-mvs folks out there. Or the
tapemap program for ex-vm users. Or Mozart/Qtip for the ex-Sperry Univac
folks.
How about a tool for dumping the error data from tape drive?
Even so, it does seem that tapes are getting better.
----- Original Message -----
From: "Mark Pinder" <MarkP AT spectralogic DOT com>
To: <veritas-bu AT mailman.eng.auburn DOT edu>
Sent: Tuesday, November 01, 2005 2:49 PM
Subject: [Veritas-bu] RE: Number of mounts before getting rid of a tape?
> Tape life is measured differently for different Drive/media vendors. Most
> of them have Numbers that appear to be inordinately high based on actual
> failure rates.
>
> Sony says that AIT tapes are good for 10,000 loads and/or 30,000 end to
> end passes. I've never seen one actually last this long, but I've also
> never seen them used this much either.
>
> LTO-2/3 media also has an anticipated life of several thousand loads and
> many more end to end passes.
>
> As a HW vendor for this stuff, I have to say that I personally think these
> are simply Marketing numbers invented through ideal testing (Like MPG for
> cars) and I would not trust them.
>
> At the same time, we have customers who have been using the same media for
> over 3 years without ill effect.
>
> The best advice I can give you about it is to use it until it fails, but
> anticipate that failure and have some amount of new media (~5%) brought in
> every quarter or so. This will give you a surplus stock to replace tapes
> that die and give you the opportunity to remove some of the most used
> tapes. These can be "filed" somewhere and used in case of emergency.
>
> I hope this helps.
>
>
> Mark Pinder
> Systems Engineer
> Spectra Logic
> www.spectralogic.com
>
>
>
> _______________________________________________
> Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
|