Bacula-users

Re: [Bacula-users] Plans for support block-based dedupe?

2013-01-03 05:24:58
Subject: Re: [Bacula-users] Plans for support block-based dedupe?
From: Silver Salonen <silver AT serverock DOT ee>
To: Radosław Korzeniewski <radoslaw AT korzeniewski DOT net>
Date: Thu, 03 Jan 2013 12:22:37 +0200
On 01/03/2013 11:59 AM, Radosław Korzeniewski wrote:
Hello,

2013/1/3 Silver Salonen <silver AT serverock DOT ee>
On 01/03/2013 09:48 AM, Sven Tegethoff wrote:
> On 03.01.2013 08:19, Gary R, Schmidt wrote:
>>> Does anyone know whether bacula can be made to work on a block-level
>>> dedupe storage system? Are there any plans to support this
>>> technology?

Bacula Systems is working new volume format which should allow better deduplication ratio on deduplication enabled filesystems or storage arrays.
 
>> What magic are you expecting?
> If I had to guess I'd say he wants to backup de-duplicated data only
> once instead of re-duplicating it. You're correct if you say "of course
> it's going to work" - after all, it's transparent on filesystem level.
> But depending on how redundant the data on the drive actually is, you
> might end up with a backup several times the original disk size - unless
> there is some mechanism to make bacula aware of which parts of the data
> is redundant. Backing up redundant data kinda defeats the purpose of
> deduping.

Cannot it be summarized just by "client-side global de-duplication"? Ie.
if the file A from Client1 is already backed up, the same file from
Client2 is backed up as a pointer only. If this is not possible, we
don't have a "true" and the "best" form of de-duplication :)

There is a project sponsored by Inteos Sp. z o.o. from Poland which implements "Variable Block Level Global Data Deduplication". It is an already working solution which additionally support deduplication data replication (currently synchronous, async is on todo list). Global deduplication means we can store only one unique block of data in whole Bacula environment using multiple SD. We can "connect" any number of Storage Daemons together and use already available data on any of them to reference a block. Variable means there is no fixed deduplication block size. We store deduplication data on disk only. Deduplication is performed online during a backup on client side with SD support.
 
You can check a presentation of this project at: http://www.youtube.com/watch?v=Vgxo8eKazKs&playnext=1&list=PL6_mbzQQJyM9hEGKK1L_5kpX6rv_ngO0N&feature=results_main (4 parts, Polish only, but a slides are in English so you should understand a key points).

This functionality will be available for Bacula Enterprise as a Storage Daemon Plugin.

Excellent news! Any ETA when the plugin will be available?

--
Silver
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users