Although it will generate lots output, have you tried turning on
debugging on the DIR and SD to see if anything shows up there?
On 2/6/2012 8:15 PM, mark.bergman AT uphs.upenn DOT edu wrote:
> In the message dated: Mon, 06 Feb 2012 12:43:41 GMT,
> The pithy ruminations from Martin Simmons on
> <Re: [Bacula-users] critical error -- tape labels get corrupted, previous
> backu
> ps unreadable> were:
>
>
> Martin,
>
> Thanks again for continuing to respond...I appreciate the feedback and
> troubleshooting help.
>
>
> => >>>>> On Fri, 03 Feb 2012 20:04:44 -0500, mark bergman said:
> => >
> => > I've added more logging to /etc/init.d/bacula-sd to confirm when tapes
> are
> => > ejected and to timestamp the SCSI release commands.
> => >
> => > Is it possible that bacula flagged tapes 003231 and 000312 as being in
> => > the drives because they were loaded when the server crashed, even
> though
> => > they were later ejected (outside of bacula's control)? Could this cause
> => > bacula to believe that the tapes were at EOT when they do get loaded,
> and
> => > bacula then immediately begins writing (corrupting the label)?
> [Unlikely
> => > that bacula would try to write before reading the label, and would then
> => > read the label after corrupting the tapes.]
> =>
> => I don't see how this could happen. Bacula issues a rewind command when it
>
> I don't see how it could happen either....but I'm searching for any
> explanation.
>
> => mounts a tape and should then know that the tape is at the start.
>
> That's what I'd expect too.
>
>
> =>
> =>
> => > When the current backup is finished, I'll extract the beginning data
> => > on each of 003231 and 000312. Is there anything you recommend in terms
> => > of checking the data on tape to determine whether the tape begins with
> => > random garbage (possibly caused by the shutdown, startup, scsi reset,
> => > etc.) or if it begins with valid bacula data that happened to overwrite
> => > the label instead of being appended?
> =>
> => Do you have a File device defined in the SD? If so, label a new File
> volume
>
> No.
>
> => and then append the data from the start of the tape to the end of the file
> => volume using dd and cat. You can then examine the file volume using bls
> -v -j
> => (the File label will allow bls to read it).
>
>
> Can I do this against a tape directly?
>
> =>
> =>
> => > Does anyone have suggestions of how to troubleshoot this further,
> => > or how to make the daemon startup process more resistant to causing
> => > any corruption?
> =>
> => The important information missing is whether 000312 was already corrupted
> at
> => 01-Feb 20:11. You could add some commands to the startup part of
>
>
> Hmmm....The only way that I could imagine that happening is if:
>
> bacula loads the tape as needed
>
> bacula reads the volume label
>
> {somehow the tape is rewound, either when the tape is first loaded, or
> after some backups are written}
>
> bacula writes to tape
>
> The only thing outside of bacula that touches the tape drive in any way is the
> /etc/init.d/bacula-sd script, which unloads any tapes before starting the
> daemon& after shutting down the daemon.
>
> => /etc/init.d/bacula-sd script before it unloads all tapes. E.g. do mt
> status,
> => mt rewind and grab a copy of the first few blocks on any loaded tapes.
>
> Sure. I'm thinking that I may modify /opt/bacula/scripts/mtx-changer to
> replace the "unload" operation with:
>
> mt rewind
> dd if=$TAPE of=/opt/bacula/working/dump_$VOLUMEID.`date '+%Y-%m-%d_%T'`
> ibs=64k count=1024
> mtx -f $ctl load $slot $drive
>
> Is that a suitable number of blocks to dump? I've got the dumps from 5
> corrupted tapes, and I'm trying to see if they have anything in common (for
> example, maybe the first 128k is corrupted, followed by valid data from dumps
> that should have been appended to the tape).
>
> =>
> => Also, you say that infrastructure1 server crashes. Maybe the crash
> caused the
> => tape to be rewound and some buffer flushed to start of the tape?
>
> I can't see how...
>
> if there was unwritten data in a buffer within the memory of the
> server infrastructure1, then when the server crashes it wouldn't
> get written to tape. The 'infrastucture' machines are part of
> an HA cluster...in this crash, the other nodes determined that
> infrastructure1 had lost communication with the quorum disk,
> and they powered off the node...even if that action reset the
> fibre loop and caused the tape library to rewind both tapes
> (unlikely), I don't know how any buffers on the infrastructure1
> server could be written when the power was out.
>
> if there was unwritten data in a buffer within the memory of
> the tape library, then I believe it must be written before any
> rewind command will be honored. If infrastructure1 sends
> data to the tape drive, that data is buffered, infrastructure1 then
> crashes, infrastructure2 runs /etc/init.d/bacula-sd (which ejects tapes,
> thereby rewinding them)...the data within the buffer in the tape
> drive would still be written before the rewind/eject command was
> executed.
>
> Thanks again for your help,
>
> Mark
>
> =>
> => __Martin
> =>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|