Bacula-users

[Bacula-users] Corrupt block on disk volume leads to unrestorable backup

2009-02-09 02:40:41
Subject: [Bacula-users] Corrupt block on disk volume leads to unrestorable backup
From: Craig Ringer <craig AT postnewspapers.com DOT au>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 09 Feb 2009 16:36:55 +0900
Hi all

I just encountered a corrupt block in one of my on-disk volumes on the 
backup server. That's an issue in and of its self, but what I wanted to 
raise was a problem it created when restoring from the damaged volume.

After starting the storage daemon with the -p option so that it wouldn't 
abort completely after detecting the checksum error, attempts to restore 
from the volume failed with:


02-Feb 14:23 HOSTNAME-sd JobId 6746: Error: block.c:318 Volume data 
error at 6:669857422! Block checksum mismatch in block=409842 len=64512: 
calc=c044b2ce blk=5df394c1

02-Feb 14:23 HOSTNAME-fd JobId 6746: Error: attribs.c:421 File size of 
restored file 
/var/spool/cyrus/restore2/mnt/cyrus_mail_snap/mail/user/USERNAME/Sent/1733. 
not correct. Original 5886048, restored 2359296.

02-Feb 14:23 HOSTNAME-dir JobId 6746: Error: Bacula backup-dir 2.4.4 
(28Dec08): 02-Feb-2009 14:23:57


It appears that the director or fd was aborting the job completely if 
one file failed to restore. I was able to prevent that with some 
butchery of attribs.c so I could restore my backup sans the file 
containing the damaged block, but I thought this issue was worth raising 
on the list since one damaged block REALLY must not prevent a backup 
from being restored. Perhaps the restore job should have an additional 
configurable parameter "errors" with options "abort" or "continue" ?

The volume in question contains files that were stored with the Options 
{ compression = gzip; signature = MD5; }.


I also think that the error message from the bacula-sd needs to point 
out the "-p" option, eg:


02-Feb 14:23 HOSTNAME-sd JobId 6746: Error: block.c:318 Volume data 
error at 6:669857422! Block checksum mismatch in block=409842 len=64512: 
calc=c044b2ce blk=5df394c1. Fatal; restart HOSTNAME-sd with the -p flag 
to attempt to continue after errors.


... especially since "-p" isn't documented in the man page, only in the 
bacula-sd usage summary. You have to know it's the sd responsible for 
aborting the job, and that the option to tell it to behave differently 
exists. That's more research than should need to be done when one's 
trying to get a server back up and running!


In closing, I'd like to note that despite this recent frustrating 
experience, I've been delighted with Bacula, and really appreciate the 
time and effort that's been put into it from the spare time of kind 
people. Having done my own fair share of OSS dev work, I know how much 
difference it can make to have people notice and appreciate your work - 
and trust me, yours has made a WORLD of difference to my sanity when 
managing a complex network of machines with several different OSes, 
absurd volumes of data, and clumsy users. For example, being able to 
effortlessly restore the newspaper's production files after a user 
accidentally deleted it on deadline morning was a lifesaver.

--
Craig Ringer

------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Corrupt block on disk volume leads to unrestorable backup, Craig Ringer <=