Bacula-users

Re: [Bacula-users] FreeBSD 9 and ZFS with compression - should be fine?

2012-02-10 08:42:32
Subject: Re: [Bacula-users] FreeBSD 9 and ZFS with compression - should be fine?
From: "Gary R. Schmidt" <grs AT mcleod-schmidt.id DOT au>
To: bacula-users AT lists.sourceforge DOT net
Date: Sat, 11 Feb 2012 00:11:21 +1100
On 02/10/12 06:58, Paul Mather wrote:
> On Feb 9, 2012, at 2:21 PM, Steven Schlansker wrote:
>
>>
>> On Feb 9, 2012, at 11:05 AM, Mark wrote:
>>> Steven, out of curiosity, do you see any benefit with dedup (assuming that 
>>> bacula volumes are the only thing on a given zfs volume).  I did some 
>>> initial trials and it appeared that bacula savesets don't dedup much, if at 
>>> all, and some searching around pointed to the bacula volume format writing 
>>> a unique value (was it jobid?) to every block, so no two blocks are ever 
>>> the same.  I'd backup hundreds of gigs of data and the dedupratio always 
>>> remained 1.00x.
>>
>> I didn't do any research, but can confirm that it seems to be useless to 
>> turn dedup on.  My pool has always been at 1.00x
>> I'm going to turn it off because from what I hear dedup is pretty expensive 
>> to run, especially if you don't actually save anything by it.
>
>
> Some time ago, I enabled dedup on a fileset with ~8 TB of data (about 4 
> million files) on a FreeBSD 8-STABLE system.  Bad move!  The machine has 16 
> GB of RAM but enabling dedup utterly killed it.  I discovered, through 
> further research, that dedup requires either a lot of RAM or a read-optimised 
> SSD to hold the dedup table (DDT).  Small filesets may work fine, but 
> anything else will quickly eat up RAM.  Worse still, the DDT is considered 
> ZFS metadata, and so is limited to 25% of the ARC cache, so you need huge 
> amounts of ARC for large DDT tables.  I've read that a rule of thumb is that 
> for every 1 TB of data you should expect 5 GB of DDT, assuming an average 
> block size of 64 KB.  For large sizes, therefore, it's not feasible to store 
> the entire DDT in RAM and thus you'd be looking at a low-latency L2ARC 
> solution instead (e.g., SSD).
>
Dedup is for systems with lots and lots of RAM and enough CPUs.

On a Fujitsu RX300, running Solaris 10 or 11, with 72Gb of RAM and 4 
dual-core Xeon CPUs, a 1TB mirrored rpool (i.e. 2 * 1TB) and an 8TB 
RAIDZ2 storage pool you just don't notice whether dedup is on or off, in 
terms of performance.

You do notice that you have more available disk space, however.

Of course, this depends on your data set - I wouldn't turn dedup on for 
a Bacula spool area, for example.

        Cheers,
                Gary    B-)

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users