Bacula-users

Re: [Bacula-users] Idea/suggestion for dedicated disk-based sd

2010-04-07 12:16:56
Subject: Re: [Bacula-users] Idea/suggestion for dedicated disk-based sd
From: Robert LeBlanc <robert AT leblancnet DOT us>
To: "bacula-users (anglais)" <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 7 Apr 2010 10:06:26 -0600
On Tue, Apr 6, 2010 at 5:19 PM, Robert LeBlanc <robert AT leblancnet DOT us> 
wrote:
> On Tue, Apr 6, 2010 at 12:37 AM, Craig Ringer
> <craig AT postnewspapers.com DOT au> wrote:
>
> [snip]
>
>>
>> Is this insane? Or a viable approach to tackling some of the
>> complexities of faking tape backup on disk as Bacula currently tries to do?
>>
>
> I love Bacula and have been working hard to promote it to people I
> know. The biggest problem with bacula is it's disk management. We have
> a DataDomain box that is getting horrible dedup rate and after looking
> at the Bacula tape stream format, I can understand why. There is so
> much extra data inserted into the stream that is very helpful for tape
> drives that it makes deduping the data nearly impossible.
>
> I would love to see the stream simplified for disk based storage.
> Another thing I'd like the option for is to be able to specify a block
> size and start a file on the block boundry, you could use sparse files
> to skip the space without taking it up. This would allow dedup
> algorithms to really be able to compress Bacula data much better. It
> would be awesome if the file stored in the Bacula stream looked
> exactly like on the file systm so that if you do any tier 3 storage
> with dedup and run your Bacula backups to the same storage, you get
> free backups.
>
> Dedup is gaining a lot of traction, name your favorite vendor, or as
> I'm doing look at lessfs. All of these would benefit hugely from a
> smart SD that knows how to handle disk storage better and make Bacula
> much more attractive. With the types of backups we are doing, we
> should be getting 10x easy on our DataDomain, but we are lucky to get
> 4x and I think that mostly comes from compression.
>
> Thanks,
>
> Robert LeBlanc
> Life Sciences & Undergraduate Education Computer Support
> Brigham Young University
>

So still thinking about this, is there any reason to not have a
hierarchical file structure for disk based backup rather than a
serialized stream? Here are my thought, any comments welcome to have a
good discussion about this.

SD_Base_Dir
    +- PoolA
    +- PoolB
            +- JobID1
            +- JobID2
                    +- Clientinfo.bacula (Bacula serial file that
holds information similar to block header)
                    +- Original File Structure (File structure from
client is maintained and repeated here, allows for browsing of files
outside of bacula)
                             +- ClientFileA
                             +- ClientFileA.bacula (Bacula serial file
that holds information similar to the unix file attribute package)
                             +- ClientFileB
                             +- ClientFileB.bacula
                             +- ClientDirA
                             +- ClientDirA.bacula

Although it's great to reuse code, I think something like this would
be very benifical to disk based backups. The would help increase dedup
rates and some file systems like btrfs and ZFS may be able to take
advantage of linked files (there has been some discussion on the btrfs
list about things like this). This would also allow it to reside on
any file system as all the ACL and information is being serialized in
separate files which keeps unique data out of the blocks of possible
duplicated data. I think we could even reuse a lot of the
serialization code, so it would just differ in how it writes the
stream of data.

Please excuse me if I'm way off here, just trying to think outside of
the box a little.

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>