Anyone use data "de-dup" technology?

timgren

ADSM.ORG Member
Joined
Dec 20, 2002
Messages
47
Reaction score
0
Points
0
Location
St. Louis
Website
www.mastercard.com
The company is looking into a data-de-dupper appliance, and a few vendors are touting a 300% decrease in backup volume.

Personally - I'm sceptical. 10-30%... maybe. 300%... Not!

Does anyone use this technology within a TSM enviromment? If so, what are your real-world statistics??

Also -- does anyone know if/how De-Duplication effects SOX or PCI compliance?
 
I would not know about compliance, but from my school days I seem to remember that a 300% decrease in volume would actually buy you double the capacity you have now ;-)
Seriously - we tested de-dup with TSM in order to decrease disk capacity - and it did compress down to 10-30% of the uncompressed capacity. That made it about double as efficient as LTO3 compression on the same data (which is still good old LZ1). Overall it didn't pay off because it wouldn't compensate for its cost, performance and complexity impact. Our mail and fileserver people are now looking into it and they sound a little less disappointed. I shall keep you posted on their results.

PJ
 
I am looking into a de-dup solution as well so any feedback would be appreciated. The de-dup vendors sure offer up a big promise. How much of a performance hit was seen ? Was this on backup or restore ? What type of complexities did it introduce into your recovery solution ? Does anyone have any experience to report on in band or out of band solutions ? De-dup ratio's with different data types ?
 
Same here...looking for some feedback from anyone using dedup with 6.1 likes/dislikes, setup.
THX
 
Issue with vendor dedup promises are they are talking in terms of the typical Weekly Full, Daily Inc model of traditional backup tools. If you have 60 day retention, that's 8 fulls, with 95% of the data identical each full, they can calculate a huge dedup ratio.

TSM doesn't follow that traditional model, so you don't get 8 fulls over a 2 month period. You get one full and then rest are all incremental. This throws the vendor's calculations off enormously.
 
Our storage vendor has deployed data domain devices in mulutiple TSM shops. They told us not to get too excited about dedupe because we use client side compression for TSM and Litespeed compression on our databases. It will gain something but it won't be anywhere near the gains we saw by turning compression on.
 
Our storage vendor has deployed data domain devices in mulutiple TSM shops. They told us not to get too excited about dedupe because we use client side compression for TSM and Litespeed compression on our databases. It will gain something but it won't be anywhere near the gains we saw by turning compression on.

x2 on the client side. Data Domain likes uncompressed data sent to it. We've getting around 8 to 14 times the compression depending on the data. (OS vs database)
 
we are using de-dupe with an IBM appliance that emulates an LTO library to TSM. When it first got on the floor we had very high hopes. It would not remain stable for more than 2 days at a time. Paths offline, fiber ports throwing errors, back-end disk (XIV) didn't like what it saw, etc

Bottom line installation was a nightmare. It was 2+ months before the device was fully working, but we still had stability issues. Switch firmware upgrade, XIV firmware upgrade, VTL upgrade, and we finally reached a point where we could keep it running properly after about 4 months of being on the floor. But, then we started having TSM issues with 6.1.2 and we were told to upgrade to 6.1.3......good god that is what started the nightmare. After 3 weeks of wrestling with 6.1.3 they finally released 6.1.3.1 and now we are running stable for about 2 weeks straight now <knocks on wood>

so for the de-dup reality. we were preached a 20:1 ratio. we are currently de-duping at 3.33:1, but still working on this with IBM.

my advice: our VTL is great when it runs stable. be sure that the vendor doesn't sell you on an unattainable de-dup ratio. be prepared for long hours on the phone with support, firmware upgrades, etc. if you don't have the time, energy, or staff to commit to the appliance, WAIT!!
 
Issue with vendor dedup promises are they are talking in terms of the typical Weekly Full, Daily Inc model of traditional backup tools. If you have 60 day retention, that's 8 fulls, with 95% of the data identical each full, they can calculate a huge dedup ratio.

TSM doesn't follow that traditional model, so you don't get 8 fulls over a 2 month period. You get one full and then rest are all incremental. This throws the vendor's calculations off enormously.

Hit the nail on the head.
 
We run dedup. I have read possible dedup of 500:1 and so on.. Possible on the planet Pandora but not here... Right now we have deduped 16 % and we have in theory "good" data for dedup. I was expecting atlest 40 % so I am very disappointed.

\Masonit
 
TSM and Dedupe

Anyone using TSM for more than a week should know that it is incremental forever. Dedupe numbers are expected at 7x for TSM in most shops. If a vendor tells you any more then they are lying.

Also, if you want to keep a tape library in place at the end of this disk>deduped disk>tape topology, be aware that TSM does NOT dedupe onto tape since it's sequential. It will UN-dedupe the data to lay it on tape, so you won't save any tapes there, and will add a little overhead on TSM.
 
In talking to DataDomain and utilizing their sizing tools, the expected dedup is about 3:1 utilizing the TSM progressive incremental backup policies. This has also been confirmed by our engineers in the lab.
 
Deduplication is always based on the data. You have to take into account databases and number of versions and what your company does. If are you an imaging company you get better out of archiving to low cost storage such as bluray or low cost disk.
 
I manage a very large VTL environment, we have 650 TB of back end storage for 12 VTL appliances. We were conservative on our estimates and went with 5:1 as our assumption. With multiple VTL's I had the luxury of sending like data to the same VTL. So all MS Exchange and SQL go to one set of VTL's, Oracle and DB2 to another, and file system backups, DB log sweeps to others. SQL and Exchange give my far the best results, around 6:1. In one environment where we had only DB2 database backups we were up to 11:1, but then we started backing up the DB log files there as well and that cut it down to 5:1.

The general file system VTL is at 3.4:1. This is about the same for the DB2/Oracle mix.

I have high hopes that TSM can get dedup right eventually. I would like to eliminate all of the VTL appliances, they are just too much of a headache to keep running. Sadly I have not seen anyone post something positive about TSM 6.x and dedup.

We see issues with paths going offline, "unable to read barcode" errors form virtual tapes, tapes stuck in drives, sometimes I think they took the virtualization went too far when they emulate all these modes of failure.
 
Seems to be some confusion regarding how to express De-duplication figures.

What I normally use is a De-dup Ratio as Nominal size / Stored size, like 7/1.

If you want to express this value (7:1) in Precentage form it is 700 %
(In comparision 15 % is actually a negative de-dup value since the lowest dedup value is 101 % .. LTO tape compression gives at best 2:1, 200% compression).


...

I agree with the previous posts regarding real-life de-dup ratio with TSM. You won't se more than 7:1 with TSM incremental for ever and only if you Don't use client compression. With Legato NW you could get 20:1 de-dup, but like for TSM it all depends on how often you run incemental backups.

Much higher de-dup ratio is possible when you backup databases like Oracle, but only backup of changed files isn't ideal for a de-dup engine.


Regards,
Nicke
 
manage a large dedupe shop

Rowl,

Can you comment on what type of dedupe software/hardware you have in your large shop?

Thanks
Jim
 
In respons to rowl's satement:

"We see issues with paths going offline, "unable to read barcode" errors form virtual tapes, tapes stuck in drives, sometimes I think they took the virtualization went too far when they emulate all these modes of failure."


Q:

1) What P.T code version are you using? Is it ProtecTIER 2.3.x.x or earlier?

2) What back-end storage is it? There are known problems with LSI/IBM DS4K and DS5K... It's better to use active-active controller disk subsystems. Also you should limit the number of FC paths to each controller (related to what P.T and RedHat level you have installed).

3) Is it TS7650-DD1 or -DD3 nodes and what is the setup (single engine or 2 node cluster(s))?


... So the de-dup ratio could be hard to fix, but the problems like "unable to read barcode" "errors form virtual tapes, tapes stuck in drives " are almost always related to SAN disk problems and can be fixed with detailed planning.


Kind regards,
nicke
 
Since my original post we have made some changes that greatly enhanced the stability of our PT environment.

1) Upgrade all systems to 2.3.x
2) Updated all zoning to single target single initiator pairs.

With tape drives we have always zoned in 1 - 4 tape drives to an HBA (depending on the tape drive speed). With the PT environment I was told that this is not supported (I wish I had been told this a long time ago). So now it's one HBA to one PT port in each zone. Since we made this change I have not seen any "weird" behavior.

-Rowl
 
Back
Top