2007-08-27 23:39:55
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
Date: Mon, 27 Aug 2007 23:37:51 -0400
I use "subfile" to differentiate from file-level de-dupe, which is
really only CAS.  (A subfile de-dupe product will, of course, notice two
files that are exactly the same as well -- just like a file-level CAS
product will.)

Subfile to me means that it looks inside the file, and looks for
duplicated information inside that file.  Consider two versions of a
file stored inside TSM, for example.  A subfile de-dupe product would
notice that most of the information between those two files is the same
and store that info once.  Then it would also store any info that is
unique to each file.  

I stay away from terms like block, chunk, and fragment in this context
because the mean different things to different people, and mean other
things historically outside of de-dupe.

Curtis - I'm unclear on your terminology.  Are you equating "subfile"
to "block" level deduping?  To me, block level means block
boundaries, whereas subfile doesn't have the boundary
restriction.  Perhaps I interpret these words this way because of my
history.  To me, a block is a 4K chunk (or 1K or some fixed
amount).  But I am suspecting that this is not what you mean.

In fact, my impression was that some vendors deduped at a block level
(my defnition) and others at a subfile level, which to me is probably
more valuable but also probably more performance-costly to implement.

I've read lots of articles about this and talked with many
vendors.  I'll take a look at your article.  Thanks.

