1. Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING) Click the link to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This message will disappear after you have made at least 12 posts. Thank you for your cooperation.

Anyone use data "de-dup" technology?

Discussion in 'Capacity Planning' started by timgren, Jun 26, 2007.

  1. timgren

    timgren New Member

    Joined:
    Dec 20, 2002
    Messages:
    47
    Likes Received:
    0
    Occupation:
    Sr. Systems Engineer
    Location:
    St. Louis
    The company is looking into a data-de-dupper appliance, and a few vendors are touting a 300% decrease in backup volume.

    Personally - I'm sceptical. 10-30%... maybe. 300%... Not!

    Does anyone use this technology within a TSM enviromment? If so, what are your real-world statistics??

    Also -- does anyone know if/how De-Duplication effects SOX or PCI compliance?
     
  2.  
  3. PJ

    PJ Senior Member

    Joined:
    Nov 18, 2005
    Messages:
    1,066
    Likes Received:
    4
    Location:
    LU Germany
    I would not know about compliance, but from my school days I seem to remember that a 300% decrease in volume would actually buy you double the capacity you have now ;-)
    Seriously - we tested de-dup with TSM in order to decrease disk capacity - and it did compress down to 10-30% of the uncompressed capacity. That made it about double as efficient as LTO3 compression on the same data (which is still good old LZ1). Overall it didn't pay off because it wouldn't compensate for its cost, performance and complexity impact. Our mail and fileserver people are now looking into it and they sound a little less disappointed. I shall keep you posted on their results.

    PJ
     
  4. LanT5

    LanT5 New Member

    Joined:
    Mar 13, 2007
    Messages:
    3
    Likes Received:
    0
    Occupation:
    DR/Storage Admin
    I am looking into a de-dup solution as well so any feedback would be appreciated. The de-dup vendors sure offer up a big promise. How much of a performance hit was seen ? Was this on backup or restore ? What type of complexities did it introduce into your recovery solution ? Does anyone have any experience to report on in band or out of band solutions ? De-dup ratio's with different data types ?
     
  5. tsmuser10

    tsmuser10 New Member

    Joined:
    Oct 11, 2004
    Messages:
    201
    Likes Received:
    0
    Same here...looking for some feedback from anyone using dedup with 6.1 likes/dislikes, setup.
    THX
     
  6. Eldoraan

    Eldoraan Senior Member

    Joined:
    Feb 19, 2003
    Messages:
    288
    Likes Received:
    10
    Occupation:
    Data Protection
    Location:
    Charlotte, NC
    Issue with vendor dedup promises are they are talking in terms of the typical Weekly Full, Daily Inc model of traditional backup tools. If you have 60 day retention, that's 8 fulls, with 95% of the data identical each full, they can calculate a huge dedup ratio.

    TSM doesn't follow that traditional model, so you don't get 8 fulls over a 2 month period. You get one full and then rest are all incremental. This throws the vendor's calculations off enormously.
     
    admin likes this.
  7. Jeff_Jeske

    Jeff_Jeske Senior Member

    Joined:
    Jul 17, 2006
    Messages:
    485
    Likes Received:
    7
    Occupation:
    Storage Engineer - DR Coordinator
    Location:
    Stevens Point, WI
    Our storage vendor has deployed data domain devices in mulutiple TSM shops. They told us not to get too excited about dedupe because we use client side compression for TSM and Litespeed compression on our databases. It will gain something but it won't be anywhere near the gains we saw by turning compression on.
     
  8. dms19

    dms19 New Member

    Joined:
    Jan 4, 2006
    Messages:
    470
    Likes Received:
    2
    Occupation:
    TSM Admin
    Location:
    In your head...
    x2 on the client side. Data Domain likes uncompressed data sent to it. We've getting around 8 to 14 times the compression depending on the data. (OS vs database)
     
  9. paschumacher

    paschumacher New Member

    Joined:
    Feb 7, 2008
    Messages:
    52
    Likes Received:
    0
    Occupation:
    Storage Systems Administrator
    Location:
    Bismarck, ND
    we are using de-dupe with an IBM appliance that emulates an LTO library to TSM. When it first got on the floor we had very high hopes. It would not remain stable for more than 2 days at a time. Paths offline, fiber ports throwing errors, back-end disk (XIV) didn't like what it saw, etc

    Bottom line installation was a nightmare. It was 2+ months before the device was fully working, but we still had stability issues. Switch firmware upgrade, XIV firmware upgrade, VTL upgrade, and we finally reached a point where we could keep it running properly after about 4 months of being on the floor. But, then we started having TSM issues with 6.1.2 and we were told to upgrade to 6.1.3......good god that is what started the nightmare. After 3 weeks of wrestling with 6.1.3 they finally released 6.1.3.1 and now we are running stable for about 2 weeks straight now <knocks on wood>

    so for the de-dup reality. we were preached a 20:1 ratio. we are currently de-duping at 3.33:1, but still working on this with IBM.

    my advice: our VTL is great when it runs stable. be sure that the vendor doesn't sell you on an unattainable de-dup ratio. be prepared for long hours on the phone with support, firmware upgrades, etc. if you don't have the time, energy, or staff to commit to the appliance, WAIT!!
     
  10. paschumacher

    paschumacher New Member

    Joined:
    Feb 7, 2008
    Messages:
    52
    Likes Received:
    0
    Occupation:
    Storage Systems Administrator
    Location:
    Bismarck, ND
    Hit the nail on the head.
     
  11. Masonit

    Masonit New Member

    Joined:
    Sep 17, 2007
    Messages:
    131
    Likes Received:
    1
    We run dedup. I have read possible dedup of 500:1 and so on.. Possible on the planet Pandora but not here... Right now we have deduped 16 % and we have in theory "good" data for dedup. I was expecting atlest 40 % so I am very disappointed.

    \Masonit
     
  12. rwhtmv

    rwhtmv New Member

    Joined:
    Apr 9, 2003
    Messages:
    228
    Likes Received:
    0
    TSM and Dedupe

    Anyone using TSM for more than a week should know that it is incremental forever. Dedupe numbers are expected at 7x for TSM in most shops. If a vendor tells you any more then they are lying.

    Also, if you want to keep a tape library in place at the end of this disk>deduped disk>tape topology, be aware that TSM does NOT dedupe onto tape since it's sequential. It will UN-dedupe the data to lay it on tape, so you won't save any tapes there, and will add a little overhead on TSM.
     
  13. dlamascus

    dlamascus New Member

    Joined:
    Apr 1, 2010
    Messages:
    2
    Likes Received:
    0
    In talking to DataDomain and utilizing their sizing tools, the expected dedup is about 3:1 utilizing the TSM progressive incremental backup policies. This has also been confirmed by our engineers in the lab.
     
  14. eperez507

    eperez507 New Member

    Joined:
    Jan 9, 2007
    Messages:
    146
    Likes Received:
    0
    Deduplication is always based on the data. You have to take into account databases and number of versions and what your company does. If are you an imaging company you get better out of archiving to low cost storage such as bluray or low cost disk.
     
  15. rowl

    rowl Member

    Joined:
    May 18, 2006
    Messages:
    216
    Likes Received:
    8
    I manage a very large VTL environment, we have 650 TB of back end storage for 12 VTL appliances. We were conservative on our estimates and went with 5:1 as our assumption. With multiple VTL's I had the luxury of sending like data to the same VTL. So all MS Exchange and SQL go to one set of VTL's, Oracle and DB2 to another, and file system backups, DB log sweeps to others. SQL and Exchange give my far the best results, around 6:1. In one environment where we had only DB2 database backups we were up to 11:1, but then we started backing up the DB log files there as well and that cut it down to 5:1.

    The general file system VTL is at 3.4:1. This is about the same for the DB2/Oracle mix.

    I have high hopes that TSM can get dedup right eventually. I would like to eliminate all of the VTL appliances, they are just too much of a headache to keep running. Sadly I have not seen anyone post something positive about TSM 6.x and dedup.

    We see issues with paths going offline, "unable to read barcode" errors form virtual tapes, tapes stuck in drives, sometimes I think they took the virtualization went too far when they emulate all these modes of failure.
     
  16. Nicke

    Nicke New Member

    Joined:
    Mar 17, 2005
    Messages:
    63
    Likes Received:
    0
    Occupation:
    Storage consultant
    Location:
    Sweden
    Seems to be some confusion regarding how to express De-duplication figures.

    What I normally use is a De-dup Ratio as Nominal size / Stored size, like 7/1.

    If you want to express this value (7:1) in Precentage form it is 700 %
    (In comparision 15 % is actually a negative de-dup value since the lowest dedup value is 101 % .. LTO tape compression gives at best 2:1, 200% compression).


    ...

    I agree with the previous posts regarding real-life de-dup ratio with TSM. You won't se more than 7:1 with TSM incremental for ever and only if you Don't use client compression. With Legato NW you could get 20:1 de-dup, but like for TSM it all depends on how often you run incemental backups.

    Much higher de-dup ratio is possible when you backup databases like Oracle, but only backup of changed files isn't ideal for a de-dup engine.


    Regards,
    Nicke
     
  17. bkupmstr

    bkupmstr New Member

    Joined:
    Sep 11, 2002
    Messages:
    8
    Likes Received:
    0
    Occupation:
    Consultant
    Location:
    NJ
    manage a large dedupe shop

    Rowl,

    Can you comment on what type of dedupe software/hardware you have in your large shop?

    Thanks
    Jim
     
  18. rowl

    rowl Member

    Joined:
    May 18, 2006
    Messages:
    216
    Likes Received:
    8
  19. Nicke

    Nicke New Member

    Joined:
    Mar 17, 2005
    Messages:
    63
    Likes Received:
    0
    Occupation:
    Storage consultant
    Location:
    Sweden
    In respons to rowl's satement:

    "We see issues with paths going offline, "unable to read barcode" errors form virtual tapes, tapes stuck in drives, sometimes I think they took the virtualization went too far when they emulate all these modes of failure."


    Q:

    1) What P.T code version are you using? Is it ProtecTIER 2.3.x.x or earlier?

    2) What back-end storage is it? There are known problems with LSI/IBM DS4K and DS5K... It's better to use active-active controller disk subsystems. Also you should limit the number of FC paths to each controller (related to what P.T and RedHat level you have installed).

    3) Is it TS7650-DD1 or -DD3 nodes and what is the setup (single engine or 2 node cluster(s))?


    ... So the de-dup ratio could be hard to fix, but the problems like "unable to read barcode" "errors form virtual tapes, tapes stuck in drives " are almost always related to SAN disk problems and can be fixed with detailed planning.


    Kind regards,
    nicke
     
  20. rowl

    rowl Member

    Joined:
    May 18, 2006
    Messages:
    216
    Likes Received:
    8
    Since my original post we have made some changes that greatly enhanced the stability of our PT environment.

    1) Upgrade all systems to 2.3.x
    2) Updated all zoning to single target single initiator pairs.

    With tape drives we have always zoned in 1 - 4 tape drives to an HBA (depending on the tape drive speed). With the PT environment I was told that this is not supported (I wish I had been told this a long time ago). So now it's one HBA to one PT port in each zone. Since we made this change I have not seen any "weird" behavior.

    -Rowl
     
  21. whitepup

    whitepup New Member

    Joined:
    Jan 25, 2007
    Messages:
    36
    Likes Received:
    1
    I have about 200 TB total on DataDomain and getting 6.9x overall.
     

Share This Page