Bacula-users

Re: [Bacula-users] FreeBSD 9 and ZFS with compression - should be fine?

2012-02-09 21:14:21
Subject: Re: [Bacula-users] FreeBSD 9 and ZFS with compression - should be fine?
From: Paul Mather <paul AT gromit.dlib.vt DOT edu>
To: Steven Schlansker <steven AT likeness DOT com>
Date: Thu, 9 Feb 2012 14:58:33 -0500
On Feb 9, 2012, at 2:21 PM, Steven Schlansker wrote:

> 
> On Feb 9, 2012, at 11:05 AM, Mark wrote:
>> Steven, out of curiosity, do you see any benefit with dedup (assuming that 
>> bacula volumes are the only thing on a given zfs volume).  I did some 
>> initial trials and it appeared that bacula savesets don't dedup much, if at 
>> all, and some searching around pointed to the bacula volume format writing a 
>> unique value (was it jobid?) to every block, so no two blocks are ever the 
>> same.  I'd backup hundreds of gigs of data and the dedupratio always 
>> remained 1.00x.
> 
> I didn't do any research, but can confirm that it seems to be useless to turn 
> dedup on.  My pool has always been at 1.00x
> I'm going to turn it off because from what I hear dedup is pretty expensive 
> to run, especially if you don't actually save anything by it.


Some time ago, I enabled dedup on a fileset with ~8 TB of data (about 4 million 
files) on a FreeBSD 8-STABLE system.  Bad move!  The machine has 16 GB of RAM 
but enabling dedup utterly killed it.  I discovered, through further research, 
that dedup requires either a lot of RAM or a read-optimised SSD to hold the 
dedup table (DDT).  Small filesets may work fine, but anything else will 
quickly eat up RAM.  Worse still, the DDT is considered ZFS metadata, and so is 
limited to 25% of the ARC cache, so you need huge amounts of ARC for large DDT 
tables.  I've read that a rule of thumb is that for every 1 TB of data you 
should expect 5 GB of DDT, assuming an average block size of 64 KB.  For large 
sizes, therefore, it's not feasible to store the entire DDT in RAM and thus 
you'd be looking at a low-latency L2ARC solution instead (e.g., SSD).


> On the flip side, compression seems to be a very big win.  I'm seeing ratios 
> from 1.7 to 2.5x savings and the CPU usage is claimed to be relatively cheap.


That's what I am seeing, too.  On the fileset I tried to dedup, I'm currently 
seeing a compressratio of 1.51x, which I'm happy with for that data.  Enabling 
ZFS compression appears to have negligible overheads, so having turned it on is 
a big win for me.

Cheers,

Paul.



------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users