1. Community Tip: Please Give Thanks to Those Sharing Their Knowledge.
    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.
    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

Data Deduplication in your environment for over 6 months

Discussion in 'Tape / Media Library' started by Frunkster, Feb 1, 2008.

  1. Frunkster

    Frunkster ADSM.ORG Member

    Jul 28, 2007
    Likes Received:
    TSM Administraor
    Miami, Fl
    I know the topic of data deduplication has been discussed on this forum before, but I could not find long term results (after at least 6 months) of the technology after its implementation.

    At this time I am specifically looking at the Data Domain products. They provide what I am looking for: data deduplication, easy integration with TSM, off-site replication for DR and they are relatively affordable. However, I am not limiting my research to them. Experiences with other vendor solutions are welcome.

    The data I am looking to store on this devices are: Oracle and MS SQL server databases, VMWare vmdk files, Notes mail files and databases, MySQL, possibly some DB2. All of them structured data.

    The frequency of the backups for these applications are as follows: Oracle full backup once or twice a week (depending on size) and hourly Oracle archive logs, full MS SQL server database backups daily with transaction logs every 15 minutes, weekly VMware vmdk images, daily full Notes backups, weekly full MySQL backups and possible weekly full DB2.

    Also planning to use the device for TSM DB backups, specially appealing due to the "instant" replication capabilities of the Data Domain appliances.

    Current retention periods vary between 30 days and 1 year. I do not expect to hold that much data in the devices. I expect to migrate older data to tape after a period of 30, 60 or 90 days.

    I would prefer a NAS target mode of operation, but if the device works only as a VTL, I am OK with that.

    Anyone has any experience they can or would like to share regarding these type of devices?

  3. carlisr

    carlisr ADSM.ORG Member

    Dec 16, 2004
    Likes Received:
    We've had Data Domain installed for almost a year now and we absolutely love it. We went from a VTL that was connected to a back-end tape library to an all disk based system with Data Domain. We're also able to remove the VTL completely out of the picture and use NFS mounts from the TSM boxes directly to Data Domain. So we setup an NFS mount at the Dev Class level right to Data Domain. That took the hassle out of worrying about managing logical volumes.

    We have 3 Data Domain boxes at our local data center and they replicate to 3 identical boxes at our off-site location. The entire setup was really simple and hasn't caused us any sort of issue. We get great compression and dedup from Data Domain as well. We hover around 10x compression from our TSM backups and 25x where we run Oracle backups directly to the Data Domain boxes.

    Just to give you an idea of our environment...we have roughly 150 clients and backup about 5TB/night. We're mixed with SQL DBs, exchange, R/3, Oracle, Windows and Linux files. We keep a 30 day retention and 2 of our boxes aren't even 40% full yet!

    Here's a link that we used quite a bit while trying to make our decision http://www.datadomain.com/pdf/GlassHouse-TSM-DataDomain-Whitepaper.pdf

    Let me know if you have any questions!
  4. kyahdhin

    kyahdhin ADSM.ORG Member

    Jun 23, 2005
    Likes Received:
    have any of you gone to a DR Test with deduplicated data and have you suffered any delays when multiple clients are pulling and waiting on the same piece of deduped data? Also has anyone experienced delays when backup stgpool processes are running with deduped data? I cant get get any of the vendors to give me a straight answer so for now dedupe is a no go. The most important aspect of the systems I am architecting are for operational as well as disaster recovery. Any help is appreciated :) Thanks!
  5. Bartdo

    Bartdo ADSM.ORG Member

    Dec 9, 2002
    Likes Received:
    Deduplication & reclamation

    Thanks for the link to a useful best practices paper for integration DataDomain
    in a TSM environment. It answers some questions about using a VTL with dedupe
    enabled. I found it particularly interesting that it seems to recommend the use
    of NFS mounts over the use of VTL. (this would make for a good topic to discuss:
    When does TSM not treat FILE DEVICE CLASS the same as LTO devclass ?)
    I have 8 years of TSM experience at various sites. I have occasionally been
    confronted with the pressure (steming from marketing machines) to introduce a VTL
    in a TSM environment. I have always resisted this trend, as my customers have been
    satisfied with either:
    LAN-BASED backups to disk StgPools (needs a good ethernet network)
    LAN-FREE to tape for high throughput requirements. (Needs a higher budget)
    In my last installation, I used a Quantum VTL to backup 600-1200MB of incremental
    nightly backups. We achieved 17x dedupe ratio, and it worked without any issues
    (except the manual is pretty slim on details). I am not dogmatic, I recognize
    that a VTL with dedupe can provide real benefits, and can be cost effective.
    What I haven't seen is a discussion about DEDUPE and TSM TAPE RECLAMATION. In the
    installation with Quantum VTL, reclamation was not an issue due to the low volumes.
    But I have other customers that are reclaming data on tape 5-8 hours per day.
    Can we entertain replacing a tape library with 10 drives with a VTL doing dedupe ?
    What type of dedupe (inline, postprocessing or intelligent postprocessing as provide
    in TSM 6.1) is more amiable to SPACE RECLAMATION ?
    It seems to me that for the first two methods, tape reclamation is a process that
    requires the dedupe engine to undeduping the data it is reading only to rededupe it
    when it writes. This could be a serious bottleneck, slowing down the reclamation in
    the case when the inline method, or not finding enough hours in the day for those
    using the post-procesing method. In boths cases, the results could be catastrophic.
    I assume that the intelligent postprocessing delivered with TSM 6.1 will provide
    much more efficiency, as it will read deduped data, and write deduped data.

    When a real tape library runs out of throughput, the remedy is simple; add
    more tape drives. I fear that there is no such simple solution for VTLs. Adding a
    second VTL or more nodes, you run into the issue of gobal dedupe vs local dedupe that
    Curtis Preston discussed in his Mr. Backup Blog http://www.backupcentral.com/content/view/231/47/.

    When a restore runs in parallel to a tape reclamation process in a physical tape
    library, there is basically no performance hit. When a VTL is introduced, we cannot
    ask the backup guys to stop the reclamation processing because a critical restore
    is launched.

    Can someone provide real live information about reclamation on a Data Domain box ?
    Rates achieved ? Hours per day ? When reclamation threshold do you use ?
    Do you stop reclamation during a restore ?

  6. vadim

    vadim ADSM.ORG Member

    Aug 13, 2007
    Likes Received:
    Sr Tech Support, Data Domain
    LTO vs FILE

    When TSM moves out or expires data from FILE volume - the FILE volume is being deleted. VTL emulates physical media and the physical media can not be deleted. :) FILE device class help to drive utilization of the unit down. VTL will keep the data on the "tape" until re-used (or deleted from the unit, outside TSM), while with FILE device expired volumes are removed instantly (for Data Domain Restorers it means "the data became eligible for cleaning").

    I think you're correct there. When you're doing space reclamation on de-duping unit you're causing a lot of CPU and disk activity and at the end the benefit is LESS than with a real tape. That is because of the nature of de-duping. The files that is TSM thinks are expired can have common data blocks that are referenced by other files that are still active. At the worst case (with great deduping ratio) the reclamation can free ZERO real space on the deduping unit.

    That's why I recommend my customers to have reclamation threshold set higher than they would with real media.

    Theoretically, yes. Not forgetting about performance (CPU and LOG/DB - disk) hit on the TSM server to archive the deduplication. Nothing comes free. :)

    The rates VERY depend on the actual model and customer's environment. In some cases customers needed to upgrade their backup servers to get the throughput they were expecting. NFS or VTL are faster than CIFS. 10GigE helps. I'd recommend get a unit for evaluation and find for yourself.
    Last edited: Mar 27, 2009

Share This Page