[ADSM-L] FW: [ADSM-L] Data Deduplication

 
Wanda,

Thanks for your cogent analysis.  Always appreciated.

We're trying to decide if we need to offer a Data Domain sort of thing
to our customers.  In the very specific case you describe, perhaps.

I am 100% with you on the "why replicate backup" when you can more
easily replicate data?!  We're offering Compellent as our active data
repository and they have a very nice replication bit that is very
bandwidth friendly.  I think money is better spent there than on
replicating backup data.  But try convincing a customer that's had the
Kool-Aid that they don't want de-duplication!

Your comment about management classes is right on!  If you limit the
number of version of a db backup that you keep to something reasonable,
like seven, let's say and with a 1TB database (which is big!), then you
have 7TB worst case of duplicate data!  Let's see: that breaks down to
about 7 LTO4 tapes.  Or 10 750GB SATA drives.  Or 7 x $100 = $700 for
tape, plus slots of course so let's say $2000.  For disk, depending on
your vendor, that could cost between $3K and $8K (and if you're paying
more than that for SATA drives you perhaps ought to seek counseling!).
So how much would  you be willing to spend to reduce this cost?  No more
than $8K.  Does a DD cost less than that?  I'm not thinking so.  And
unless my math is way off you can make a reasonable argument against for
even more db data!'

It's all about mind share, isn't it?  Today, de-duplication is hot...



Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp AT storserver DOT com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Wanda Prather
Sent: Wednesday, August 29, 2007 3:12 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Data Deduplication

Kelly,

I have more than 1 customer considering a de-dup VTL product.

It's true that for regular file systems, TSM doesn't redump unchanged
files, so people aren't getting AS LARGE a reduction in data stored (of
that type) as would a user of an old style full dump- incremental -
incremental - full dump product.

OTOH, even with TSM, your DB dumps (Exchange, SQL, most Oracle
implementations) are still for the most part full dumps, followed by
icrementals, then full dumps.  The larger the data base, in most cases,
the less the contents change.  And you can't use subfile-backup on
anything larger than 2 GB.

I have several customers that have a relatively small number of clients
(say 50 or less), but the bulk of their daily backup data is 1 or 2 very
large data bases.  And the bulk of the CONTENTS of those data bases
doesn't change all that much.  Send that DB full dump to a de-dup VTL
that can identify duplicate "blobs" (I'm using that as a generic term
because I don't mean "block" in the sense of a disk block or sector and
different vendors can identify larger or smaller duplicate blobs), and
you get a very large impact that TSM can't provide.  The only thing that
gets stored each day is the delta bits.  Even if it's an
Exchange/SQL/Oracle full-dump day, the amount of new data to be stored
may be 10% or less of what it used to be.

And I have more than 1 customer looking at a de-dup VTL as a way to make
managing their own DR sites practical, because those VTL's can replicate
to EACH OTHER across the WAN.  The huge cost in transmitting your data
to a DR site is the cost of the pipe.  If, however, you can get the
amount of data per day down to 10% of what it used to be by having the
VTL compress and dedup, and you have another corporate location where
you can put the other VTL, it starts looking close to cost-effective in
$$ terms.  In fact, IBM recovery services is offering Data Domain
equipment on the floor in at least 1 of their recovery sites for that
purpose.  (The customer installs a a DD box on their site, leases the DD
box in the IBM DR site, replicates between.)

(Insert disclaimer here:  I'm not necessarily a fan of replicating
backup data, because the problem my customers always have is doing the
DB recovery. I think the first choice should be replicating the real DB
using something like MIMIX, so that it's always ready to go on the
recovery end.
 I merely report the bit about replicating backup data because I have
customers considering it.)

Regarding the lost sales opportunities, I think you gotta go back and
consider the features that TSM has that other people don't, dedup or not
- there was a discussion on the list last month about comparing TSM to
Legato & others, and there was remarkably little emphasis on management
classes and the ability of TSM to treat different data differently
according to business needs- I still haven't seen any other product that
has what TSM provides.  (Here not afraid to expose MY ignorance - would
like to know if there is anything else out there -)

Wanda

> I'd like to steer this around a bit.  Our sales folks are saying they 
> are losing TSM opportunities to de-dup vendors.  What specific 
> business problem are customers trying to solve with de-dup?
>
> I'm thinking the following:
>
> 1. Reduce the amount of disk/tape required to storage backups.
> Especially important for all an all disk backup solution.
> 2. Reduce backup times (for source de-dup I would think.  No benefit 
> in target de-dup for this).
> 3. Replication of backup data across a wide area network.  Obviously 
> if you have less stored you have less to replicate.
>
> Others?  Relative importance of these?
>
> Does TSM in and of itself provide similar benefits in its natural
state?
> From this discussion adding de-dup at the backend does not necessarily

> provide much though it does for the other traditional backup products.
> Since we don't dup, we don't need to de-dup.
>
> Help me get it because aside from the typical "I gotta have it because

> the trade rags tell me I gotta have it", I don't get it!
>
> Thanks, (Once again not afraid to expose my vast pool of ignorance...)
>
>
> Kelly J. Lipp
> VP Manufacturing & CTO
> STORServer, Inc.
> 485-B Elkton Drive
> Colorado Springs, CO 80907
> 719-266-8777
> lipp AT storserver DOT com
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of Curtis Preston
> Sent: Wednesday, August 29, 2007 1:08 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] Data Deduplication
>
>>As de-dup, from what I have read, compares across all files on a 
>>"system" (server, disk storage or whatever), it seems to me that this 
>>will be an enormous resource hog
>
> Exactly.  To make sure everyone understands, the "system," is the 
> intelligent disk target, not a host you're backing up.  A de-dupe 
> IDT/VTL is able to de-dupe anything against anything else that's been 
> sent to it.  This can include, for example, a file in a filesystem and

> the same file inside an Exchange Sent Items folder.
>
>>The de-dup technology only compares / looks at the files with in its 
>>specific repository.  Example: We have 8 Protectier node in one data
>>center which equtes to 8 Virtual Tape Libraries and 8   reposoitires.
> The
>
> There are VTL/IDT vendors that offer a multi-head approach to 
> de-duplication.  As you need more throughput, you buy more heads, and 
> all heads are part of one large appliance that uses a single global 
> de-dupe database.  That way you don't have to point worry about which 
> backups go to which heads.  Diligent's VTL Open is a multi-headed VTL,

> but ProtecTier is not -- yet.  I would ask them their plans for that.
>
> While this feature is not required for many shops, I think it's a very

> important feature for large shops.
>