Veritas-bu

[Veritas-bu] Re: Help! All Tapes From Scratch

2006-05-04 19:53:09
Subject: [Veritas-bu] Re: Help! All Tapes From Scratch
From: bob944 AT attglobal DOT net (bob944)
Date: Thu, 4 May 2006 19:53:09 -0400
> I agree with your comments about throwing the tape away.
> However, can we assume this scenario:
> 
> Full Backup Friday of critical Server. Completed all ok, but 
> during the job a status appeared (Media Write Error, or
> Media Position Error) although only once.
> 
> Sunday Server Dies
> 
> Come in Monday, and you can ONLY restore from Fridays Backup! Are you
> telling me that its VERY unlikely Netbackup will restore ANY 
> Data from the tapes it used?

Simon, I wasn't addressing your fried-robot situation earlier, so let me
clarify.  

Backups that were successful are restorable unless the tapes were
damaged.  If the backups weren't successful (0 or 1), there is no data
to restore.

(The rest of this note is probably too long, too pedantic and too boring
for anyone without a masochistic streak.  You've been warned.  :-)

If you were referring to "losing data" in

> > Amazing.  So, in your environment, it's considered more
> > cost-effective to risk losing data or fail another backup
> > than to throw out a $50 tape?

my point was to the nature of magnetic media[1].  A successful write
doesn't guarantee that you can read that block later (maybe a bit of
oxide flakes off, for instance).  And that's best case.  Now, when a
tape has already demonstrated that it has flaws (the write error you
mentioned  above, for instance), we _know_ it has at least one problem
spot and that tape is much more likely to cost you another backup versus
a tape without known problems.  Or, worse, the drive's retry logic gets
a successful write on the 17th automatic retry, you think the tape is
now "good," and next month you're trying to recover that payroll master
file and that block just isn't quite good enough to read any more.
Nobody's happy.

So, since an uncorrectable write error (the only ones you're going to
see since the drive/driver hide the self-corrected ones) means the
backup job is a failure--no data is retained, do I want to use tapes
with problem histories for that payroll master server?  Or anything
else?  Since the _only_ reason to do backups in the first place is to be
able to restore data--and if we're restoring, we must _need_ that data,
I think not.  -  bob

p.s.  In another lifetime, I was involved on the vendor side with a
customer who cooked his mainframe.  Two different times.  Wound up
replacing the entire room full of equipment both times--the boxes that
weren't flat dead on Monday were flaky and intermittent so everything
got tossed.  I don't know the specifics of your weekend incident, but I
wouldn't trust anything in the room, hardware or media, without at least
thoroughly testing it.

1.  Especially sequential media (tape versus disk).  There are many
mechanicals involved--head alignment, tracking, wear, drag, dirt--and
these change from drive to drive.  The same type of drive may be built
by different companies, or with different mechanical revisions, or
firmware changes.  The tape can stretch, wrinkle or get its edges
damaged by use, temperature/humidity and improper storage.  But the
oxide... there's where the variables really are.  It can flake, get
scratched, pick up a thickness of crud, be worn down and just plain be
manufactured with flaws.  Always has, which is why mag media
devices/controllers/drivers all have error compensation measures,
including retry logic.  

You can write the same data to the same tape in the same drive and get
different results.  With a disk (and all start conditions identical),
data X gets written to sector Y, every single time.  With tape... all
you can guarantee is the order of the data.  There are gaps between
blocks and between tape marks.  A common drive/ctrlr/driver response to
write errors is to back up, wipe out the failed block, erase a bit of
tape and try again--now everything downstream is being written to a
different place than before.  Gaps are not all consistent.  Some drives
vary the tape transport speed to match the data rate...  Lots of
variables.  Lots of possibilities for a bad tape to pass a subsequent
test--or a "good" one to fail tomorrow.  Some reasons to replace flaky
hardware and media at the flaky stage--not waiting until it is flat
down.  And good reasons to make duplicates.



<Prev in Thread] Current Thread [Next in Thread>