[Networker] AW: [Networker] mt status command aborting a running backup

Yes we did :-((
If you do an "mt" you actually do an "open" on a device . In always the most 
cases this will invoce a "reset" on the device and, if there is a tape in, a 
"rewind" to "BOT" during a running operation will occure. 
This means in most all cases DATA LOSS !!
Your're lucky, because NetWorker recognized this interrupt and reported an 
error. Most of the time the running application won't notice that there was a 
reset and will overwrite data on tape from the beginning.

Our own experience has taught us, that there are a lot of possibilities which 
can cause an impact on a tape, used by NetWorker:
*       reboots
*       a running RMS under Windows, 
*       under Linux (SuSE) Kernels earlier 2.4.19, due to an inappropriate 
timeout value, the tape driver is quite likely to invoke a bus reset due to a 
SCSI command timeout. The bug is located in libscsi.so
*       products like: ServerView, scsidev, SUN Management Center, Web System 
Administration, CST Agent Package SUNWcst.
*       HBA firmware and drivers wich are not up to date

This are only examples which came across in our backup enviroment. Have a look 
for the term "reset on san" in the WWW.


regards
Klaus

> -----Ursprüngliche Nachricht-----
> Von: Legato NetWorker discussion 
> [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] Im Auftrag von Clb
> Gesendet: Donnerstag, 3. Februar 2005 17:56
> An: NETWORKER AT LISTSERV.TEMPLE DOT EDU
> Betreff: [Networker] mt status command aborting a running backup
> 
> 
> I have a question for the group.  Has anyone ever issued an 
> mt -f device
> status command while a backup was running?  If so, what happened?
> 
> Here is what happened to us.  We have several Tru64 5.1b, dedicated
> storage nodes attached to a Solaris 8 Networker server.  We have 8
> tape drives in an Stk L700 tape library, fibre-attached to each.
> 
> On the first node, we had a backup running.
> 
> On the second node, someone issued an mt -f status command on 
> one of the
> drives in the library which was currently being written to by 
> the first
> node.
> 
> Immediately afterwards, Networker reported io errors on the tape drive
> and the backup aborted.
> 
> Has anyone ever seen this happen in their environment, and if so, what
> if anything was done to prevent it from occuring in the future?
> 
> --
> Note: To sign off this list, send a "signoff networker" 
> command via email
> to listserv AT listserv.temple DOT edu or visit the list's Web site at
> http://listserv.temple.edu/archives/networker.html where you can
> also view and post messages to the list. Questions regarding this list
> should be sent to stan AT temple DOT edu
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
> 

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=