On Feb 9, 2012, at 2:21 PM, Steven Schlansker wrote:
>
> On Feb 9, 2012, at 11:05 AM, Mark wrote:
>> Steven, out of curiosity, do you see any benefit with dedup (assuming that
>> bacula volumes are the only thing on a given zfs volume). I did some
>> initial trials and it appeared that bacula savesets don't dedup much, if at
>> all, and some searching around pointed to the bacula volume format writing a
>> unique value (was it jobid?) to every block, so no two blocks are ever the
>> same. I'd backup hundreds of gigs of data and the dedupratio always
>> remained 1.00x.
>
> I didn't do any research, but can confirm that it seems to be useless to turn
> dedup on. My pool has always been at 1.00x
> I'm going to turn it off because from what I hear dedup is pretty expensive
> to run, especially if you don't actually save anything by it.
Some time ago, I enabled dedup on a fileset with ~8 TB of data (about 4 million
files) on a FreeBSD 8-STABLE system. Bad move! The machine has 16 GB of RAM
but enabling dedup utterly killed it. I discovered, through further research,
that dedup requires either a lot of RAM or a read-optimised SSD to hold the
dedup table (DDT). Small filesets may work fine, but anything else will
quickly eat up RAM. Worse still, the DDT is considered ZFS metadata, and so is
limited to 25% of the ARC cache, so you need huge amounts of ARC for large DDT
tables. I've read that a rule of thumb is that for every 1 TB of data you
should expect 5 GB of DDT, assuming an average block size of 64 KB. For large
sizes, therefore, it's not feasible to store the entire DDT in RAM and thus
you'd be looking at a low-latency L2ARC solution instead (e.g., SSD).
> On the flip side, compression seems to be a very big win. I'm seeing ratios
> from 1.7 to 2.5x savings and the CPU usage is claimed to be relatively cheap.
That's what I am seeing, too. On the fileset I tried to dedup, I'm currently
seeing a compressratio of 1.51x, which I'm happy with for that data. Enabling
ZFS compression appears to have negligible overheads, so having turned it on is
a big win for me.
Cheers,
Paul.
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|