Re: [ADSM-L] Data Deduplication

2007-08-26 19:07:49
Subject: Re: [ADSM-L] Data Deduplication
From: Curtis Preston <cpreston AT GLASSHOUSE DOT COM>
Date: Sun, 26 Aug 2007 19:05:50 -0400
>Is TSM planning on adding data deduplication similar to avamar? 

As mentioned by Richard, the closest thing TSM has to this now is
subfile backup.  It is related to de-duplication, where once it has a
backup of a given file, it backs up only the changed bytes of that file.
This is also referred to as delta incrementals.

True de-duplication takes this much farther, as it would recognize a
file or email that's duplicated on two or three different systems, such
as an attachment/email that's sent to users on several different
Exchange servers.  The "compression" ratios it can achieve are therefore
much higher than delta differentials.

>I understand how TSM does not duplicate data now but minor edits in
>or simple file name changes would result in additional copies of the
>entire file using TSM today.

Instead of switching from TSM to something like Avamar (EMC) or Puredisk
(Symantec), a TSM user can benefit from de-dupe today by using a
de-duplication backup target, such as de-dupe VTL or NAS device.  Just
make sure you realize that you won't the same de-dupe as non-TSM users.
(TSM customers who switch to a de-dupe target are seeing approximately
10:1 de-dupe ratios, where non-TSM customers are seeing 20:1.)

Most TSM users don't do repeated full backups of their filesystems, and
a lot of the duplicated data comes from those full backups.  But TSM
users still have duplicated data: multiple versions of the same file and
database backups.   You already mentioned edited versions of the same
file.  It is also common that a file will be present in multiple places.
In addition, TSM users do perform periodic full backups of their
database data.

>We recently had a pitch from EMC on avamar. I can think of some reasons
>to pass on it (Having two separate backup/restore solutions is a big
>one, cost etc) but some persuasive arguments were made supporting their

If you like the idea of using de-dupe to backup your remote offices
(which is what Avamar and Puredisk are designed for), but want to stay
with TSM, again de-dupe targets can help.  Buy a small de-dupe target to
place at your remote site, perform TSM backups to it, then replicate the
new/unique blocks to a central location as your offsite mechanism.

>If TSM is going to be adding similar functionality soon it may
>be another reason to focus on other efforts.

Writing a de-dupe backup product isn't easy.  EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own.  That may take them a bit

