Hi all
I just encountered a corrupt block in one of my on-disk volumes on the
backup server. That's an issue in and of its self, but what I wanted to
raise was a problem it created when restoring from the damaged volume.
After starting the storage daemon with the -p option so that it wouldn't
abort completely after detecting the checksum error, attempts to restore
from the volume failed with:
02-Feb 14:23 HOSTNAME-sd JobId 6746: Error: block.c:318 Volume data
error at 6:669857422! Block checksum mismatch in block=409842 len=64512:
calc=c044b2ce blk=5df394c1
02-Feb 14:23 HOSTNAME-fd JobId 6746: Error: attribs.c:421 File size of
restored file
/var/spool/cyrus/restore2/mnt/cyrus_mail_snap/mail/user/USERNAME/Sent/1733.
not correct. Original 5886048, restored 2359296.
02-Feb 14:23 HOSTNAME-dir JobId 6746: Error: Bacula backup-dir 2.4.4
(28Dec08): 02-Feb-2009 14:23:57
It appears that the director or fd was aborting the job completely if
one file failed to restore. I was able to prevent that with some
butchery of attribs.c so I could restore my backup sans the file
containing the damaged block, but I thought this issue was worth raising
on the list since one damaged block REALLY must not prevent a backup
from being restored. Perhaps the restore job should have an additional
configurable parameter "errors" with options "abort" or "continue" ?
The volume in question contains files that were stored with the Options
{ compression = gzip; signature = MD5; }.
I also think that the error message from the bacula-sd needs to point
out the "-p" option, eg:
02-Feb 14:23 HOSTNAME-sd JobId 6746: Error: block.c:318 Volume data
error at 6:669857422! Block checksum mismatch in block=409842 len=64512:
calc=c044b2ce blk=5df394c1. Fatal; restart HOSTNAME-sd with the -p flag
to attempt to continue after errors.
... especially since "-p" isn't documented in the man page, only in the
bacula-sd usage summary. You have to know it's the sd responsible for
aborting the job, and that the option to tell it to behave differently
exists. That's more research than should need to be done when one's
trying to get a server back up and running!
In closing, I'd like to note that despite this recent frustrating
experience, I've been delighted with Bacula, and really appreciate the
time and effort that's been put into it from the spare time of kind
people. Having done my own fair share of OSS dev work, I know how much
difference it can make to have people notice and appreciate your work -
and trust me, yours has made a WORLD of difference to my sanity when
managing a complex network of machines with several different OSes,
absurd volumes of data, and clumsy users. For example, being able to
effortlessly restore the newspaper's production files after a user
accidentally deleted it on deadline morning was a lifesaver.
--
Craig Ringer
------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|