Here's an article on the odds of a collision occurring in a hash only
environment for
Anyone who is interested in more information.
http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/145-de-dupe-hash-collisions.html
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Daniel Sparrman
Sent: Thursday, October 06, 2011 7:19 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for
pirmary pool
I have customers who during an audit sees the object count go from 0 to +2
billion and then starts counting backwards with a "-" (that was during TSM 5.5)
several times. So no, it's not bullocks. I did however mean "million" (as in,
several billion) so a mistake from my side there.
Some of those customers also hit the technical limit during 5.5 for the
database size (524GB) on several of their TSM instances. Thus having even more
instances of TSM today.
Sorry for the mistake saying "billion" and not "million". As for how much
objects they actually have in each TSM instance, it's fairly hard to tell since
there is no possibility to do a select on contents for example to count the
amount of objects. Those kind of SQL statements just hangs. And like I said,
during the last audit we did on one of the TSM instances, it went up to
2100000000 objects and then started counting backwards several times so we
actually have no clue about the exact amount of objects in that database.
Regards
Daniel
Daniel Sparrman
Exist i Stockholm AB
Växel: 08-754 98 00
Fax: 08-754 97 30
daniel.sparrman AT exist DOT se
http://www.existgruppen.se
Posthusgatan 1 761 30 NORRTÄLJE
-----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
Till: ADSM-L AT VM.MARIST DOT EDU
Från: Ben Bullock
Sänt av: "ADSM: Dist Stor Manager"
Datum: 10/06/2011 16:11
Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary pool
Ok, I have been following this thread with some interest, since I have a dedupe
appliance. From the conversation, I've come to the conclusion that Daniel is a
very cautious administrator who would like to eliminate any risk of data loss.
Don't we all, it's a noble and worthwhile endeavor. All the discussed options
are worthwhile if you are concerned about hash collisions (copypools, Async
replication, reuse delay, etc)
At some point in the pursuit, you get to the point where there are diminishing
returns and it is not worth the money to eliminate the next .01% probability of
failure. Everyone will have a different stopping point.
We get it. I think we have beat this horse within an inch of its life.
But I gotta ask...
Daniel, you said "but I've got several TSM customers who have several
thousands of billions of objects". Are you telling us that someone has a TSM
server with multiple ~TRILLIONS~ of objects backed up? Is that hyperbole or
truth?
Ben
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Daniel Sparrman
Sent: Wednesday, October 05, 2011 3:26 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary pool
Hi Remco
Not sure if you're talking about hardware de-dup or TSM de-dup (which is using
a larger block size due to the load) but:
Relatively small? I've only seen it happen once, but then I live in a
relatively small market since I live in Sweden. So you're telling me (based on
facts) that this haven't happened elsewhere? I seriously have to disagree. In
my opinion, it think it's more likley that others that had this issue have
decided to keep it in the dark. Sweden is a relatively small market, and the
odds that it would have happened here, but nowhere else, is quite small.
Not sure about the size or anything in your TSM comparison, but I've got
several TSM customers who have several thousands of billions of objects ... And
like I said, if it's a chance of 1000.000.000.000 it's much more likely to hit
you at 1000.000. It's not a quota that needs to be filled before it hits you.
It's a random chance.
And, alike the customer I had who got it, if it's a very common block geting
that hash conflict, yes, it will hit you badly since every file that contains
that block will be invalid.
I do agree about your comment about TSM v6 though, I'd consider it very stable,
I'd actually (today, with the amount of checking being done) consider it more
stable than still being at version 5.5
Regards
Daniel
Daniel Sparrman
Exist i Stockholm AB
Växel: 08-754 98 00
Fax: 08-754 97 30
daniel.sparrman AT exist DOT se
http://www.existgruppen.se
Posthusgatan 1 761 30 NORRTÄLJE
-----"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> skrev: -----
Till: ADSM-L AT VM.MARIST DOT EDU
Från: Remco Post
Sänt av: "ADSM: Dist Stor Manager"
Datum: 10/05/2011 21:11
Ärende: Re: [ADSM-L] vtl versus file systems for pirmary pool
Hi,
I saw last week that about half of the people visiting the TSM Symposium were
running V6, it's been stable for me so far.
The likeliness of an accidental SHA1 hash collision is relatively small even
compared to the total number of objects that a TSM server could possibly ever
store during its entire lifetime, insignificant. That being said, if you think
that your data is to valuable to even risk that, don't dedup.
--
Gr., Remco
Op 5 okt. 2011 om 19:24 heeft Shawn Drew <shawn.drew AT AMERICAS.BNPPARIBAS DOT
COM> het volgende geschreven:
> Along this line, we are still using TSM5.5 Some of the features
> mentioned previously require TSM6. TSM6 still feels risky to me.
> Maybe more risky than a hash collision.
> Just looking for a consensus, Do people think its mature enough now
> that it is as stable/reliable as TSM5 ?
>
> PS. Test restores are the only way to be sure your backups are good.
> You shouldn't just "trust" TSM.
>
> Regards,
> Shawn
> ________________________________________________
> Shawn Drew
>
>
>
>
>
> Internet
> rrhodes AT FIRSTENERGYCORP DOT COM
>
> Sent by: ADSM-L AT VM.MARIST DOT EDU
> 10/05/2011 11:03 AM
> Please respond to
> ADSM-L AT VM.MARIST DOT EDU
>
>
> To
> ADSM-L
> cc
>
> Subject
> Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang:
> Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl
> versus file systems for pirmary pool
>
>
>
>
>
>
>> When TSM is duplicating your data (aka backing up storage pools),
>> there is no logical connection between your primary storage pool and
>> your copypool.
>
> Well . . .yes . .. no . . .
>
> All our eggs are in one basket no matter what. The logical connection
> between pri and copy pools is TSM itself. A logical corruption in TSM
> can take out both. Your data could be sitting there on tape and
> completely useless. Yes, that's why we have TSM db backups, but are
> they good? What if there is a TSM bug that renders all your backups
> bad - we don't find out until we need it!
>
> At some point you have to trust something. We all trust TSM. Yes, we
> do the db backup, create pri and copy pools, use reuse delay . .
> .everything to allow for problems . . . but we are still trusting that
> TSM workss as advertised. A really, really paranoid would run two
> complete separate/different backup systems - but who can afford that, or want
> to?
> But then, we do do that for our biggest SAP/ORacle systems. We use
> Oracle/RMAN-to-flasharea/RMAN-to-TDPO/TSM, but we also run EMC/clone
> backups off our DR sites R2's . . but also to TSM.
>
>
> Rick
>
>
>
>
>
> -----------------------------------------
> The information contained in this message is intended only for the
> personal and confidential use of the recipient(s) named above. If the
> reader of this message is not the intended recipient or an agent
> responsible for delivering it to the intended recipient, you are
> hereby notified that you have received this document in error and that
> any review, dissemination, distribution, or copying of this message is
> strictly prohibited. If you have received this communication in error,
> please notify us immediately, and delete the original message.
>
>
>
> This message and any attachments (the "message") is intended solely
> for the addressees and is confidential. If you receive this message in
> error, please delete it and immediately notify the sender. Any use not
> in accord with its purpose, any dissemination or disclosure, either
> whole or partial, is prohibited except formal approval. The internet
> can not guarantee the integrity of this message. BNP PARIBAS (and its
> subsidiaries) shall (will) not therefore be liable for the message if
> modified. Please note that certain functions and services for BNP Paribas may
> be performed by BNP Paribas RCC, Inc.
The BCI Email Firewall made the following annotations
---------------------------------------------------------------------
*Confidentiality Notice:
This E-Mail is intended only for the use of the individual or entity to which
it is addressed and may contain information that is privileged, confidential
and exempt from disclosure under applicable law. If you have received this
communication in error, please do not distribute, and delete the original
message.
Thank you for your compliance.
You may contact us at:
Blue Cross of Idaho
3000 E. Pine Ave.
Meridian, Idaho 83642
1.208.345.4550
---------------------------------------------------------------------
|