Page 1 of 2 12 LastLast
Results 1 to 24 of 33
  1. #1
    Member
    Join Date
    Dec 2002
    Location
    St. Louis
    Posts
    47
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Cool Anyone use data "de-dup" technology?

    The company is looking into a data-de-dupper appliance, and a few vendors are touting a 300% decrease in backup volume.

    Personally - I'm sceptical. 10-30%... maybe. 300%... Not!

    Does anyone use this technology within a TSM enviromment? If so, what are your real-world statistics??

    Also -- does anyone know if/how De-Duplication effects SOX or PCI compliance?

  2. #2
    Senior Member
    Join Date
    Nov 2005
    Location
    LU Germany
    Posts
    1,066
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    I would not know about compliance, but from my school days I seem to remember that a 300% decrease in volume would actually buy you double the capacity you have now
    Seriously - we tested de-dup with TSM in order to decrease disk capacity - and it did compress down to 10-30% of the uncompressed capacity. That made it about double as efficient as LTO3 compression on the same data (which is still good old LZ1). Overall it didn't pay off because it wouldn't compensate for its cost, performance and complexity impact. Our mail and fileserver people are now looking into it and they sound a little less disappointed. I shall keep you posted on their results.

    PJ

  3. #3
    Member
    Join Date
    Mar 2007
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    I am looking into a de-dup solution as well so any feedback would be appreciated. The de-dup vendors sure offer up a big promise. How much of a performance hit was seen ? Was this on backup or restore ? What type of complexities did it introduce into your recovery solution ? Does anyone have any experience to report on in band or out of band solutions ? De-dup ratio's with different data types ?

  4. #4
    Member
    Join Date
    Oct 2004
    Posts
    198
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Same here...looking for some feedback from anyone using dedup with 6.1 likes/dislikes, setup.
    THX

  5. #5
    Senior Member
    Join Date
    Feb 2003
    Location
    Charlotte, NC
    Posts
    288
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    Issue with vendor dedup promises are they are talking in terms of the typical Weekly Full, Daily Inc model of traditional backup tools. If you have 60 day retention, that's 8 fulls, with 95% of the data identical each full, they can calculate a huge dedup ratio.

    TSM doesn't follow that traditional model, so you don't get 8 fulls over a 2 month period. You get one full and then rest are all incremental. This throws the vendor's calculations off enormously.
    Michael
    TSM Admin since ADSM v2... wow, I'm not even that old yet either!

  6. #6
    Senior Member Jeff_Jeske's Avatar
    Join Date
    Jul 2006
    Location
    Stevens Point, WI
    Posts
    485
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default

    Our storage vendor has deployed data domain devices in mulutiple TSM shops. They told us not to get too excited about dedupe because we use client side compression for TSM and Litespeed compression on our databases. It will gain something but it won't be anywhere near the gains we saw by turning compression on.

  7. #7
    Member
    Join Date
    Jan 2006
    Location
    In your head...
    Posts
    470
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by Jeff_Jeske View Post
    Our storage vendor has deployed data domain devices in mulutiple TSM shops. They told us not to get too excited about dedupe because we use client side compression for TSM and Litespeed compression on our databases. It will gain something but it won't be anywhere near the gains we saw by turning compression on.
    x2 on the client side. Data Domain likes uncompressed data sent to it. We've getting around 8 to 14 times the compression depending on the data. (OS vs database)
    "Whats this button do?"

  8. #8
    Member
    Join Date
    Feb 2008
    Location
    Bismarck, ND
    Posts
    52
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    we are using de-dupe with an IBM appliance that emulates an LTO library to TSM. When it first got on the floor we had very high hopes. It would not remain stable for more than 2 days at a time. Paths offline, fiber ports throwing errors, back-end disk (XIV) didn't like what it saw, etc

    Bottom line installation was a nightmare. It was 2+ months before the device was fully working, but we still had stability issues. Switch firmware upgrade, XIV firmware upgrade, VTL upgrade, and we finally reached a point where we could keep it running properly after about 4 months of being on the floor. But, then we started having TSM issues with 6.1.2 and we were told to upgrade to 6.1.3......good god that is what started the nightmare. After 3 weeks of wrestling with 6.1.3 they finally released 6.1.3.1 and now we are running stable for about 2 weeks straight now <knocks on wood>

    so for the de-dup reality. we were preached a 20:1 ratio. we are currently de-duping at 3.33:1, but still working on this with IBM.

    my advice: our VTL is great when it runs stable. be sure that the vendor doesn't sell you on an unattainable de-dup ratio. be prepared for long hours on the phone with support, firmware upgrades, etc. if you don't have the time, energy, or staff to commit to the appliance, WAIT!!

  9. #9
    Member
    Join Date
    Feb 2008
    Location
    Bismarck, ND
    Posts
    52
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by Eldoraan View Post
    Issue with vendor dedup promises are they are talking in terms of the typical Weekly Full, Daily Inc model of traditional backup tools. If you have 60 day retention, that's 8 fulls, with 95% of the data identical each full, they can calculate a huge dedup ratio.

    TSM doesn't follow that traditional model, so you don't get 8 fulls over a 2 month period. You get one full and then rest are all incremental. This throws the vendor's calculations off enormously.
    Hit the nail on the head.

  10. #10
    Member
    Join Date
    Sep 2007
    Posts
    131
    Thanks
    1
    Thanked 1 Time in 1 Post

    Default

    We run dedup. I have read possible dedup of 500:1 and so on.. Possible on the planet Pandora but not here... Right now we have deduped 16 % and we have in theory "good" data for dedup. I was expecting atlest 40 % so I am very disappointed.

    \Masonit

  11. #11
    Member rwhtmv's Avatar
    Join Date
    Apr 2003
    Posts
    228
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Blog Entries
    1

    Default TSM and Dedupe

    Anyone using TSM for more than a week should know that it is incremental forever. Dedupe numbers are expected at 7x for TSM in most shops. If a vendor tells you any more then they are lying.

    Also, if you want to keep a tape library in place at the end of this disk>deduped disk>tape topology, be aware that TSM does NOT dedupe onto tape since it's sequential. It will UN-dedupe the data to lay it on tape, so you won't save any tapes there, and will add a little overhead on TSM.

  12. #12
    Newcomer
    Join Date
    Apr 2010
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    In talking to DataDomain and utilizing their sizing tools, the expected dedup is about 3:1 utilizing the TSM progressive incremental backup policies. This has also been confirmed by our engineers in the lab.

  13. #13
    Member
    Join Date
    Jan 2007
    Posts
    146
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Deduplication is always based on the data. You have to take into account databases and number of versions and what your company does. If are you an imaging company you get better out of archiving to low cost storage such as bluray or low cost disk.

  14. #14
    Member
    Join Date
    May 2006
    Posts
    197
    Thanks
    3
    Thanked 6 Times in 5 Posts

    Default

    I manage a very large VTL environment, we have 650 TB of back end storage for 12 VTL appliances. We were conservative on our estimates and went with 5:1 as our assumption. With multiple VTL's I had the luxury of sending like data to the same VTL. So all MS Exchange and SQL go to one set of VTL's, Oracle and DB2 to another, and file system backups, DB log sweeps to others. SQL and Exchange give my far the best results, around 6:1. In one environment where we had only DB2 database backups we were up to 11:1, but then we started backing up the DB log files there as well and that cut it down to 5:1.

    The general file system VTL is at 3.4:1. This is about the same for the DB2/Oracle mix.

    I have high hopes that TSM can get dedup right eventually. I would like to eliminate all of the VTL appliances, they are just too much of a headache to keep running. Sadly I have not seen anyone post something positive about TSM 6.x and dedup.

    We see issues with paths going offline, "unable to read barcode" errors form virtual tapes, tapes stuck in drives, sometimes I think they took the virtualization went too far when they emulate all these modes of failure.

  15. #15
    Member
    Join Date
    Mar 2005
    Location
    Sweden
    Posts
    63
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Seems to be some confusion regarding how to express De-duplication figures.

    What I normally use is a De-dup Ratio as Nominal size / Stored size, like 7/1.

    If you want to express this value (7:1) in Precentage form it is 700 %
    (In comparision 15 % is actually a negative de-dup value since the lowest dedup value is 101 % .. LTO tape compression gives at best 2:1, 200% compression).


    ...

    I agree with the previous posts regarding real-life de-dup ratio with TSM. You won't se more than 7:1 with TSM incremental for ever and only if you Don't use client compression. With Legato NW you could get 20:1 de-dup, but like for TSM it all depends on how often you run incemental backups.

    Much higher de-dup ratio is possible when you backup databases like Oracle, but only backup of changed files isn't ideal for a de-dup engine.


    Regards,
    Nicke

  16. #16
    Member
    Join Date
    Sep 2002
    Location
    NJ
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default manage a large dedupe shop

    Rowl,

    Can you comment on what type of dedupe software/hardware you have in your large shop?

    Thanks
    Jim

  17. #17
    Member
    Join Date
    May 2006
    Posts
    197
    Thanks
    3
    Thanked 6 Times in 5 Posts

    Default

    We are using the IBM ProtecTIER gateway product.

    http://www-03.ibm.com/systems/storag...ier/index.html

  18. #18
    Member
    Join Date
    Mar 2005
    Location
    Sweden
    Posts
    63
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Arrow

    In respons to rowl's satement:

    "We see issues with paths going offline, "unable to read barcode" errors form virtual tapes, tapes stuck in drives, sometimes I think they took the virtualization went too far when they emulate all these modes of failure."


    Q:

    1) What P.T code version are you using? Is it ProtecTIER 2.3.x.x or earlier?

    2) What back-end storage is it? There are known problems with LSI/IBM DS4K and DS5K... It's better to use active-active controller disk subsystems. Also you should limit the number of FC paths to each controller (related to what P.T and RedHat level you have installed).

    3) Is it TS7650-DD1 or -DD3 nodes and what is the setup (single engine or 2 node cluster(s))?


    ... So the de-dup ratio could be hard to fix, but the problems like "unable to read barcode" "errors form virtual tapes, tapes stuck in drives " are almost always related to SAN disk problems and can be fixed with detailed planning.


    Kind regards,
    nicke

  19. #19
    Member
    Join Date
    May 2006
    Posts
    197
    Thanks
    3
    Thanked 6 Times in 5 Posts

    Default

    Since my original post we have made some changes that greatly enhanced the stability of our PT environment.

    1) Upgrade all systems to 2.3.x
    2) Updated all zoning to single target single initiator pairs.

    With tape drives we have always zoned in 1 - 4 tape drives to an HBA (depending on the tape drive speed). With the PT environment I was told that this is not supported (I wish I had been told this a long time ago). So now it's one HBA to one PT port in each zone. Since we made this change I have not seen any "weird" behavior.

    -Rowl

  20. #20
    Member
    Join Date
    Jan 2007
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    I have about 200 TB total on DataDomain and getting 6.9x overall.

  21. #21
    Member
    Join Date
    Mar 2005
    Location
    Sweden
    Posts
    63
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Smile Almost half the stgpool size with de-dup in 6.1.3.2

    I think the native TSM 6.x de-dup works pretty good

    One example is this VTL stgpool with 4,4 TB free space:

    Storage Pool Name: ACTIVE_VTL_DD
    Storage Pool Type: Primary
    Device Class Name: VTLDD
    Estimated Capacity: 7,682 G
    Space Trigger Util: 94.0
    Pct Util: 62.0
    Pct Migr: 62.0
    Pct Logical: 79.0
    High Mig Pct: 90
    Low Mig Pct: 70
    Migration Delay: 0
    Migration Continue: Yes
    Migration Processes: 1
    Reclamation Processes: 6
    Next Storage Pool: ACTIVELTO
    Reclaim Storage Pool:
    Maximum Size Threshold: No Limit
    Access: Read/Write
    Description: Dedup-pool
    Overflow Location:
    Cache Migrated Files?:
    Collocate?: Group
    Reclamation Threshold: 60
    Offsite Reclamation Limit:
    Maximum Scratch Volumes Allowed: 760
    Number of Scratch Volumes Used: 505
    Delay Period for Volume Reuse: 0 Day(s)
    Migration in Progress?: No
    Amount Migrated (MB): 1,483,681.47
    Elapsed Migration Time (seconds): 63,407
    Reclamation in Progress?: No
    Last Update by (administrator): XXXXX
    Last Update Date/Time: 05/21/2010 12:05:12
    Storage Pool Data Format: Native
    Copy Storage Pool(s):
    Active Data Pool(s):
    Continue Copy on Error?: Yes
    CRC Data: No
    Reclamation Type: Threshold
    Overwrite Data when Deleted:
    Deduplicate Data?: Yes
    Processes For Identifying Duplicates: 2
    Duplicate Data Not Stored: 4 400 G (49%)


    ...

    We only run 2 de-dup identity processes and we monitor the reclaim processes that free up volumes.


    Process Process Description Status
    Number
    -------- -------------------- -------------------------------------------------
    206 Identify Duplicates Storage pool: ACTIVE_VTL_DD. Volume: NONE. State:
    idle. State Date/Time: 2010-06-24 08:26:52.
    Current Physical File(bytes): 0. Total Files
    Processed: 3388025. Total Duplicate Extents
    Found: 27 705 973. Total Duplicate Bytes Found:
    4 785 235 436 185.
    207 Identify Duplicates Storage pool: ACTIVE_VTL_DD. Volume: NONE. State:
    idle. State Date/Time: 2010-06-24 08:20:17.
    Current Physical File(bytes): 0. Total Files
    Processed: 598506. Total Duplicate Extents
    Found: 7 594 202. Total Duplicate Bytes Found: 1
    043 675 163 147.

    ...

    So we are happy with the TSM 6 de-dup.

    ...

    With IBM ProtecTIER or EMC Data Domain maybe it possible to get 3-6 times (300 - 600 %) de-dup ratio, but it will require a length and costly installation project and also with TSM you have always had a very good native VTL function so you don't need an external VTL product.

    ...




    Kind Regards,
    Nicke

  22. #22
    Member
    Join Date
    Aug 2007
    Location
    SFBA
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by Nicke View Post

    With IBM ProtecTIER or EMC Data Domain maybe it possible to get 3-6 times (300 - 600 %) de-dup ratio, but it will require a length and costly installation project and also with TSM you have always had a very good native VTL function so you don't need an external VTL product.
    Why lengthy project? I just want to say to when I was DD customer it took only about a day of work. I guess it depends on a size of the environment and how good are you with it.

    I'm agree that you don't want to have VTL, though. VTL is trading physical to virtual, but all the tape management will stay the same. Why anybody would want that?.. NFS will do just fine, or actually better. If it's speed - get 10Gb/s Ethernet adapters. VTL is just another layer that WAS needed for integration when TSM didn't have well developed FILE type device. It's NOT the case anymore. Actually DD folks are trying to talk out customers from using VTL for awhile now, but some just don't want to listen.
    Thanks,
    Vadim

  23. #23
    Member
    Join Date
    Jan 2007
    Posts
    146
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    You have to realize that if you do a NFS that data domain has a limit on streams so you better limit the mounts correctly with device class or use Drives and your limiting factor with VTL. We currently run 2 fully loaded DD880's and relicate it offsite to another pair. It works very well and replication and restores are awesome. We see between 5-7x deduplication and no slowness with restores. Inline backup and replication is the way to go.

  24. #24
    Member
    Join Date
    Aug 2007
    Location
    SFBA
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by Masonit View Post
    We run dedup. I have read possible dedup of 500:1 and so on.. Possible on the planet Pandora but not here... Right now we have deduped 16 % and we have in theory "good" data for dedup. I was expecting atlest 40 % so I am very disappointed.

    \Masonit
    DDR can't dedupe if there are not much of dupes With incremental forever it's usually the case.
    Thanks,
    Vadim

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 2
    Last Post: 02-04-2007, 08:20 AM
  2. Difference between "move data" & "move nodedata"?
    By cheffern in forum Backup / Archive Discussion
    Replies: 2
    Last Post: 12-11-2006, 09:30 AM
  3. TDP for SQL: Backups changing DB recovery mode from "normal" to "simple"?
    By c.j.hund in forum TDP/Application Layer Backup
    Replies: 1
    Last Post: 09-12-2006, 04:29 AM
  4. accidental "delete filespace" and "remove node"
    By rogwall in forum Administrative Client
    Replies: 2
    Last Post: 05-09-2006, 02:57 PM
  5. Replies: 1
    Last Post: 11-19-2003, 10:05 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •