Bacula-users

Re: [Bacula-users] [Bacula-devel] Idea/suggestion for dedicated disk-based sd

2010-04-08 10:51:10
Subject: Re: [Bacula-users] [Bacula-devel] Idea/suggestion for dedicated disk-based sd
From: Robert LeBlanc <robert AT leblancnet DOT us>
To: Kern Sibbald <kern AT sibbald DOT com>
Date: Thu, 8 Apr 2010 08:48:52 -0600
On Thu, Apr 8, 2010 at 12:39 AM, Kern Sibbald <kern AT sibbald DOT com> wrote:
> Hello,
>
> I haven't seen the original messages, so I am not sure if I understand the
> full concept here so my remarks may not be pertinent.
>
> However, from what I see, this is basically similar to what BackuPC does.  The
> big problem I have with it is that it does not scale well to thousands of
> machines.
>
> If I were thinking about changing the disk Volume format, I would start by
> looking at how git handles storing objects, and whether git can scale to
> handle a machine with 40 million file entries.
>
> One thing is sure is that, unless some new way of implementing hardlinks is
> implemented, you will never see Bacula using hard links in the volumes. That
> is a sure way to make your machine unbootable if you scale large enough  Just
> backup enough clients with BackupPC and one day you will find that fsck no
> longer works.  I suspect that it will require only a couple hundred million
> hardlinks before a Linux machine will no longer boot.
>

It wasn't my intention that Bacula try to create the hard links like
BackupPC, I figured that if someone wanted to do that, they could run
a script outside of Bacula. I'm thinking of the ability to offload the
data compression to the file system in general, or alternatively have
Bacula compress it. The reason being is that with Bacula's current
tape format, dedup technologies can not dedup it very well. From what
I can tell of the tape format, every 64K of duplicate data had a
unique header rendering it unique and therefore not a candidate for
dedup.

I had two ideas for trying to overcome this problem, one was to have a
slightly modified Bacula tape format for disks that would move the
unique header information to the front or the back of the job stream,
and the format would create a sparse file with job files starting at a
user defined blocksize. I then thought about storing tier 3 data on
the same dedup device or file system and that if done a certain way we
could get 'free' backups. If Bacula backed-up to the same device with
a hierarchical file system approach, then the original files and the
files that Bacula backed up would look the same. Plus it would be easy
to recover in the case of total failures of Bacula-dir and sd (I'm
thinking disaster recovery).

I've been running Bacula backups on a dedup box for almost a year and
can't get better than 4x when I believe the data that we have should
be about 10x. With dedup becoming more popular, I'm just trying to
make Bacula even more appealing for those who want to dedup. If people
are using straight disk, then compression could be enabled by bacula
and the format might be a little different (like tar bz2 archives),
but most newer file systems are starting to support on the fly
compression, so I don't know how critical it is.

These are all ideas to get some discussion about how, if a file aware
SD is implemented, what may be good to offer maximum flexibility and
be able to leverage features that are being implemented in current and
future file systems.

Thanks,

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users