Bacula-users

Re: [Bacula-users] critical error -- tape labels get corrupted, previous backups unreadable

2012-02-06 07:45:23
Subject: Re: [Bacula-users] critical error -- tape labels get corrupted, previous backups unreadable
From: Martin Simmons <martin AT lispworks DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 6 Feb 2012 12:43:41 GMT
>>>>> On Fri, 03 Feb 2012 20:04:44 -0500, mark bergman said:
> 
> I've added more logging to /etc/init.d/bacula-sd to confirm when tapes are
> ejected and to timestamp the SCSI release commands.
> 
> Is it possible that bacula flagged tapes 003231 and 000312 as being in
> the drives because they were loaded when the server crashed, even though
> they were later ejected (outside of bacula's control)? Could this cause
> bacula to believe that the tapes were at EOT when they do get loaded, and
> bacula then immediately begins writing (corrupting the label)? [Unlikely
> that bacula would try to write before reading the label, and would then
> read the label after corrupting the tapes.]

I don't see how this could happen.  Bacula issues a rewind command when it
mounts a tape and should then know that the tape is at the start.


> When the current backup is finished, I'll extract the beginning data
> on each of 003231 and 000312. Is there anything you recommend in terms
> of checking the data on tape to determine whether the tape begins with
> random garbage (possibly caused by the shutdown, startup, scsi reset,
> etc.) or if it begins with valid bacula data that happened to overwrite
> the label instead of being appended?

Do you have a File device defined in the SD?  If so, label a new File volume
and then append the data from the start of the tape to the end of the file
volume using dd and cat.  You can then examine the file volume using bls -v -j
(the File label will allow bls to read it).


> Does anyone have suggestions of how to troubleshoot this further,
> or how to make the daemon startup process more resistant to causing
> any corruption?

The important information missing is whether 000312 was already corrupted at
01-Feb 20:11.  You could add some commands to the startup part of
/etc/init.d/bacula-sd script before it unloads all tapes.  E.g. do mt status,
mt rewind and grab a copy of the first few blocks on any loaded tapes.

Also, you say that infrastructure1 server crashes.  Maybe the crash caused the
tape to be rewound and some buffer flushed to start of the tape?

__Martin

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users