1. Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING) Click the link to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This message will disappear after you have made at least 12 posts. Thank you for your cooperation.

TSM Deduplication Limitations ?

Discussion in 'VTL - Virtual Tape Library' started by rowl, Dec 18, 2009.

  1. rowl

    rowl Member

    Joined:
    May 18, 2006
    Messages:
    215
    Likes Received:
    8
    I have heard some comments from local TSM folks that TSM 6 dedup is only usable on pools up to 5-6 TB in size. While I didn't get a lot of details, it sounded like they ended up cpu bound.

    I am curious if anyone here has had positive (or negative) experiences with TSM deduplication and large storage pools. We are looking at the possibility of replacing deduplicating VTL's with large disk pools. It would be far less expensive, and less complicated than the VTL route if TSM deduplication is usable on a large scale.

    To give you an idea of how large "large" is, we have some hosts that have occupancy numbers in TSM in the 80 - 100 TB range. Nearly 4 PB of total occupancy in our TSM backup environment. On average we move 60-80TB a day of backups to TSM.

    The server platform we are considering is the Sun x4540. 12 cores, 32 GB RAM, 48 1.5 TB drives behind a zfs file system. With this platform each server adds more cpu, RAM, and capacity to the environment, so the hope is this will help to scale up the cpu\memory\storage bandwidth needed for deduplication.

    Thanks,
    -Rowl
     
  2.  
  3. Jeff_Jeske

    Jeff_Jeske Senior Member

    Joined:
    Jul 17, 2006
    Messages:
    485
    Likes Received:
    7
    Occupation:
    Storage Engineer - DR Coordinator
    Location:
    Stevens Point, WI
    I can't answer your question. But I'm surprised you would want to put all your eggs in one basket (one big TSM server).

    We are not as big as you and we run windows. Our approach to move away from the VTL was to buy more less expensive servers and spread out the processing load, bandwidth, disaster exposure.

    We have a couple SATA only TSM servers. They work great but when compared to the VTL there are additional exposures to consider.

    The OS has access to all TSM data. A virus or some form of corruption could whack both storage pools. Adding many LUNs to a server makes managment of those LUNs very sensative. Your server and tsm admins best pay very good attention to detail. You should also think about the max number of LUNs your server will be happy with.

    The VTLs were designed for a specific misison and they do it very well. When moving to self managed disk you'll need to design your own raid, spindal count, LUN size, ect... you might not see the gains you expect to see.
     
  4. javajockey

    javajockey Senior Member

    Joined:
    Dec 26, 2007
    Messages:
    265
    Likes Received:
    6
    Occupation:
    Server Operations
    Location:
    Yorktown
    That was one of the deciding factors for management to go with AIX in my enviroment :p



    Seriously, is anyone using TSM for deduping in large environments? I'm stuck with the same delima
     
  5. Canuck

    Canuck New Member

    Joined:
    Mar 24, 2008
    Messages:
    32
    Likes Received:
    1
    That 6TB figure is how much backup data a TSM server could effectively dedupe per day. It's going to keep track of what data has been deduped in the storage pool and won't have to scan the entire storage pool every day. You just need to keep in mind that the processing of 6TB backup data could take up to 18 hours (depending on the number of CPUs / disk speed etc). So if your system isn't fast enough then you will never dedupe all the data coming in and could fall behind / use up too much storage space.

    TSM dedupe is probably realistic if your TSM server backs up 2 - 4 TB per day. But then you also don't get dedupe across storage pools / TSM servers either whereas you would with one of the various dedupe appliances out there.
     
  6. javajockey

    javajockey Senior Member

    Joined:
    Dec 26, 2007
    Messages:
    265
    Likes Received:
    6
    Occupation:
    Server Operations
    Location:
    Yorktown
    Thanks for clearing that up Canuck. So I guess that you could probably postpone the dedup process until the weekend. My servers are usually just running reclaimation during that off time.
     
  7. Canuck

    Canuck New Member

    Joined:
    Mar 24, 2008
    Messages:
    32
    Likes Received:
    1
    Leaving it to the weekend could cost you quite a lot of storage though...With TSM dedupe you need that 'extra landing zone' of storage for the nightly backup, then it runs its dedupe process on the volumes (throwing out redundant chunks) and after dedupe and expiration TSM needs to run reclamation to recover the space. You don't actually lower your 'used' storage until it completely runs through the dedupe / reclamation processes.
     
  8. bhaven41

    bhaven41 New Member

    Joined:
    Feb 27, 2007
    Messages:
    70
    Likes Received:
    1
    Occupation:
    Storage/Backup Admin
    Location:
    Singapore
    I have used de-dup in my environment for 4-5 TB of data.
    Till now I am fighting to get it work properly.

    De-dup works fine but it makes expiration to hang. When this happens all other processes and backups also hang. Something to do with TSM Server resource conflict.

    Opened a pmr and they are working on a diag server for us.

    So if you plan to use this in your production, please test it in test environment first.
     
  9. rowl

    rowl Member

    Joined:
    May 18, 2006
    Messages:
    215
    Likes Received:
    8
    Thanks for all the feed back on this topic. To clarify one point I am not looking to create one big TSM server. This platform is far too small for that, even with deduplication. One of the ongoing struggles has been scaling up cpu/memory/capacity and I/O. The x4540 would be considered a building block, and we would roll out as many as needed to meet our capacity and throughput needs.

    Using 4TB/day as a starting point, this sounds promising. On a per-TSM server basis that is about the average I am measuring. Tape still exists here, so that may well be used as an overflow pool for the data stored on these servers internal disk.
     

Share This Page