Veritas-bu

[Veritas-bu] EOM and stat 84 (was: Space left on Tape)

2006-07-19 20:48:02
Subject: [Veritas-bu] EOM and stat 84 (was: Space left on Tape)
From: bob944 at attglobal.net (bob944)
Date: Wed, 19 Jul 2006 20:48:02 -0400
> Hi Bob
> Re: the EOM - End Of Media - any ideas why then, that 
> Netbackup is reporting status 84 for them? Just bit
> confused why it only seems to be happenign on
> the Media Servers and nothing else

Simon - found your posting in v3#62.  Can't tell from your description,
but I'll guess that your master is also a media and doesn't have this
problem, but you have 2+ media servers which do.  I'd first be
suspicious of what's different between mast and med--drivers, drives,
connections, hardware platform, OS, patch levels, et cetera.  Were I
convinced that all these things were the same (or can't be changed--Unix
master and Win media, for instance), I'd have to prove to myself that
the problem is repeatable on the medias and never happens on the master,
even when the master becomes the media server for the backups in
question.  'Cuz, unless you've uncovered a bug, NetBackup really isn't
involved here--it's at a driver/hardware/path level.  Oh, and I'd have
to prove to myself that it's not media-related.  (Proving these things
is another good reason for following the Bob's Best Practices of having
a Test pool and TEST- policies where you can backupu, abort, use, abuse,
expire, modify, change media servers or the media to your heart's
content and not risk screwing up a "real" backup.)

Now.  Caveat:  my tape programming and firmware work was in proprietary
Honeywell and GE days and precedes the SCSI era.  To check what I
_think_ I know, I'm using a Quantum DLT8000 manual for reference.  And,
to get a big chunk of definition out of the way, it's below under
"supporting information."

There are three bits in byte 2 of SCSI sense info (filemark, end of
medium and incorrect length indicator), all of which can appear with
sense data (sent when a driver asks for status after an operation).

Most Unices' drivers let you use tape drives in two ways, ATT-based or
BSD-based, which handle things like read-at-filemark are handled.
NetBackup uses BSD behavior in Unix.  Windows drivers may behave like
BSD, or maybe NetBackup accepts different behaviors from a Win device.
Everything here is from SCSI standards or BSD-behavior docs.

When writing, an app should receive EOT (end of tape) status when (in
the olden days) the drive senses a little reflective strip on the
non-oxide side of the tape several feet from the physical end.  Behavior
(and this way precedes SCSI) should be for the drive to finish the
write, return the condition (in SCSI, return CHECK CONDITION status, set
the EOM bit, set the Sense Key to NO SENSE, and set the ASC/ASCQ (don't
ask) fields to EOM/P Detected).  If more writes are received, keep doing
the above.  The app should respond to the first CHECK it can by writing
two filemarks (tape marks), rewind/unload/new tape.  (Two TMs is a
convention for 1/2-inch tape that I believe is universal these days, and
reading two TMs in a row signals EOT/EOM on read.)

When the drive senses physical EOM on a write (don't know how this is
done; tape used to run off the end of the reel), it should return CHECK
CONDITION status, EOM and Valid bits, and Volume Overflow in the sense
key field, the residue (unwritten byte count) in the Information field
and ASC/ASCQ to EOM/P Detected, and leave the tape at EOM/P.  

Now, in your log info, it looks like a new tape is being used, 0218L3,
on drive index 2, at 03:12:xx
:04.849 checks the header and rewinds
:08.302 rewrites the header and rewinds
:12.911 receives error status from the write, "EOM encountered writing
header block"

So, either you have found a NetBackup bug where it goofs up the write at
a high level (not likely), or the OS/driver/HBA/fibre/drive/drive
firmware/media is bad.  Get your testing done to determine if it's
drive- or media- or server-related and go from there.  Detecting EOM on
a write at BOT... something is hosed.  

I've seen some error summaries for IBM Ultriums which involved bogus EOT
detection, so firmware is definitely in play.

Don't I remember you had some flaky tape connection problems months ago?

BTW, I noticed the tape s/n was 0218L3, which likely means it is really
something like 000218L3 and you're losing the two most-significant
digits due to NetBackups default of using the last six characters.
Another BBP is to set up the barcode reader rules to use the first six
characters, as they are the serial number--the trailing tape-type L3 is
not of much value to you.  You may not plan on having more than a
boatload of tapes, but the first time you have to deal with a serial
number collision (say, EA0218L3), you'll wish the tape namespace were
six characters rather than four.


Supporting info
---------------
Note, the acronyms are confusing, and different vendor docs aren't
consistent:
EOT     End Of Tape
        o  might mean Early-Warning
        o  might mean PEOT
        o  might mean EOD
LEOT    Logical EOT
        o  usually means Early-Warning
EWEOM   Early-warning End Of Media
        o  found on some media.  see Early-Warning
Early-Warning
        o  the drive mechanism (the tinfoil, hole in the tape, counter,
... which says "time to wrap up and start using another tape"
PEOT    Physical EOT, or Hard EOT
        o  we're trying to pull the tape off the hub.  There ain't no
more.
EOM     End Of Medium (EOM/P End Of Medium/Parititon--there's a
2-partition standard which I know nothing about)
        o  same as Hard EOT, sometimes
        o  same as LEOT in mt command usage
        o  end-of-recorded-medium, sometimes (in middle of tape, found
two TMs)
EOD     End Of Data

The only three conditions we need are
1.  End-of-data.  The tape is half-full.  There should be two TMs after
the last record.  This is EOD, but sometimes called EOT or EOM.
2.  Early-warning.  The point where the tape drive detects we're close
to the end (reflector, hole, whatever).  This is "early warning," LEOT
or EOT.
3.  End-of-medium.  The physical end of the tape.  PEOT or EOM.

And for anyone bored enough to read this far:  I'm looking for detailed
technhical descriptions of current tape drives as all my hardware
knowledge is out of date.  How is BOT/EOT detected these days, for
example?  Google and I are _not_ getting along.



<Prev in Thread] Current Thread [Next in Thread>