Re: [ADSM-L] Data Deduplication
>Is TSM planning on adding data deduplication similar to avamar?
As mentioned by Richard, the closest thing TSM has to this now is
subfile backup. It is related to de-duplication, where once it has a
backup of a given file, it backs up only the changed bytes of that file.
This is also referred to as delta incrementals.
True de-duplication takes this much farther, as it would recognize a
file or email that's duplicated on two or three different systems, such
as an attachment/email that's sent to users on several different
Exchange servers. The "compression" ratios it can achieve are therefore
much higher than delta differentials.
>I understand how TSM does not duplicate data now but minor edits in
>or simple file name changes would result in additional copies of the
>entire file using TSM today.
Instead of switching from TSM to something like Avamar (EMC) or Puredisk
(Symantec), a TSM user can benefit from de-dupe today by using a
de-duplication backup target, such as de-dupe VTL or NAS device. Just
make sure you realize that you won't the same de-dupe as non-TSM users.
(TSM customers who switch to a de-dupe target are seeing approximately
10:1 de-dupe ratios, where non-TSM customers are seeing 20:1.)
Most TSM users don't do repeated full backups of their filesystems, and
a lot of the duplicated data comes from those full backups. But TSM
users still have duplicated data: multiple versions of the same file and
database backups. You already mentioned edited versions of the same
file. It is also common that a file will be present in multiple places.
In addition, TSM users do perform periodic full backups of their
>We recently had a pitch from EMC on avamar. I can think of some reasons
>to pass on it (Having two separate backup/restore solutions is a big
>one, cost etc) but some persuasive arguments were made supporting their
If you like the idea of using de-dupe to backup your remote offices
(which is what Avamar and Puredisk are designed for), but want to stay
with TSM, again de-dupe targets can help. Buy a small de-dupe target to
place at your remote site, perform TSM backups to it, then replicate the
new/unique blocks to a central location as your offsite mechanism.
>If TSM is going to be adding similar functionality soon it may
>be another reason to focus on other efforts.
Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit