Re: [Networker] ZFS deduplication.

On Mar 18, 2010, at 17:33, Yaron Zabary wrote:

. Make ZFS aware of how Networker (or Netbackup) works and see thatit can look for duplicate data (not just duplicate blocks).
. Make Networker write the save sets in such a way that filesalways start at the beginning of a block.
I don't see why EMC will want to solve this problem, because it willallow for a low cost alternative to Data Domain and VTLs productlines.


The OP may want to look at ZFS' "recordsize" property:

Specifies a suggested block size for files in the file system. Thisproperty is designed solely for use with database workloads thataccess files in fixed-size records. ZFS automatically tunes blocksizes according to internal algorithms optimized for typical accesspatterns.
For databases that create very large files but access them in smallrandom chunks, these algorithms may be suboptimal. Specifying arecordsize greater than or equal to the record size of the databasecan result in significant performance gains. Use of this propertyfor general purpose file systems is strongly discouraged, and mayadversely affect performance.


http://docs.sun.com/app/docs/doc/819-2240/zfs-1m

The way ZFS work is that it tries to guess what the application istrying to do, and determinte the optimal block size to write at (andto do things like calculate parity for RAID-Z). So if you havemultiple writers, some may be given a recordsize of (say) 2 KB, whileanother is 64 KB, etc.

When there's 128 KB worth of data to write to disk, ZFS creates a"transaction group" (txg) and writes all 128 KB at once sequentially.Any data that doesn't fit into that txg will be put into another txgand will go out in the next 128 KB write. (There's also a timer thatfires every 5-30s to make sure things don't just sit around.)

If there's 1 MB of writes to do, it gets broken up into 128 KB chunksand written out (each chunk getting its own txg number). So if theAFTD has a blocking factor you can just set it to something big (andperhaps a multiple of 128 KB) and ZFS will try to stream things as itcomes in.

I think there's also a distinction between the data being coalesced onthe NW client and being sent to the media server, and the media serverthen organizing the data to be sent to the back-end storage media.Basically there are two places where you have to worry data being"aligned" properly.

I also don't see Sun fixing this problem.

At least file a bug or RFE if you have a support contract. LarryEllison wants to get into the systems business, and dedupe targetscould sell a decent amount of storage appliances. It's in theirinterest to fix it. :)


Another forum to look at may be the ZFS-discuss list:

        http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
        http://opensolaris.org/jive/forum.jspa?forumID=80

ZFS also offers compression, which may be an alternative (thoughperhaps more CPU intensive).


To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER