Networker

Re: [Networker] ZFS deduplication.

2010-03-18 18:16:41
Subject: Re: [Networker] ZFS deduplication.
From: David Magda <dmagda AT EE.RYERSON DOT CA>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 18 Mar 2010 18:14:17 -0400
On Mar 18, 2010, at 17:33, Yaron Zabary wrote:

. Make ZFS aware of how Networker (or Netbackup) works and see that it can look for duplicate data (not just duplicate blocks).

. Make Networker write the save sets in such a way that files always start at the beginning of a block.

I don't see why EMC will want to solve this problem, because it will allow for a low cost alternative to Data Domain and VTLs product lines.

The OP may want to look at ZFS' "recordsize" property:

Specifies a suggested block size for files in the file system. This property is designed solely for use with database workloads that access files in fixed-size records. ZFS automatically tunes block sizes according to internal algorithms optimized for typical access patterns.

For databases that create very large files but access them in small random chunks, these algorithms may be suboptimal. Specifying a recordsize greater than or equal to the record size of the database can result in significant performance gains. Use of this property for general purpose file systems is strongly discouraged, and may adversely affect performance.

http://docs.sun.com/app/docs/doc/819-2240/zfs-1m

The way ZFS work is that it tries to guess what the application is trying to do, and determinte the optimal block size to write at (and to do things like calculate parity for RAID-Z). So if you have multiple writers, some may be given a recordsize of (say) 2 KB, while another is 64 KB, etc.

When there's 128 KB worth of data to write to disk, ZFS creates a "transaction group" (txg) and writes all 128 KB at once sequentially. Any data that doesn't fit into that txg will be put into another txg and will go out in the next 128 KB write. (There's also a timer that fires every 5-30s to make sure things don't just sit around.)

If there's 1 MB of writes to do, it gets broken up into 128 KB chunks and written out (each chunk getting its own txg number). So if the AFTD has a blocking factor you can just set it to something big (and perhaps a multiple of 128 KB) and ZFS will try to stream things as it comes in.

I think there's also a distinction between the data being coalesced on the NW client and being sent to the media server, and the media server then organizing the data to be sent to the back-end storage media. Basically there are two places where you have to worry data being "aligned" properly.

I also don't see Sun fixing this problem.

At least file a bug or RFE if you have a support contract. Larry Ellison wants to get into the systems business, and dedupe targets could sell a decent amount of storage appliances. It's in their interest to fix it. :)

Another forum to look at may be the ZFS-discuss list:

        http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
        http://opensolaris.org/jive/forum.jspa?forumID=80

ZFS also offers compression, which may be an alternative (though perhaps more CPU intensive).

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>