Veritas-bu

Re: [Veritas-bu] Tapeless backup environments

2007-10-19 12:26:10
Subject: Re: [Veritas-bu] Tapeless backup environments
From: "Eagle, Kent" <KEagle AT wilmingtontrust DOT com>
To: "Curtis Preston" <cpreston AT glasshouse DOT com>, <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Fri, 19 Oct 2007 12:08:20 -0400
O.k., at the risk of seeming like "I wrote more than you, therefore I
must be right"...

2nd. (and last) post on this -

My first point was that you quoted a "Wikipedia" article as a source.
For me, it really had nothing to do with the subject matter. They have a
disclaimer as to the validity of anything on there, and for good reason:
Anyone can post anything on there, about anything, containing anything.
It might be right, it might be wrong. I would be far more inclined to
trust, or quote an industry consortium, or even a vendors test results
page than "Wikipedia".

As long as were throwing credentials around, I might as well mention: As
a former scientist, and statistician, and current engineer, I fully
understand what empirical research is. It INCLUDES math. It is the
actual testing and the statistics of that testing. FWIW: I was trained
in this and FMEA (Failure Modes Effects Analysis) by the gentleman who
ran the Reliability and Maintainability program for Boeing's Saturn and
Apollo space programs, as well as their VERTOL and fixed wing programs.

I can see where my second point could have easily been misinterpreted.
Apologies to anyone led astray. What I meant was that the posts made by
"Bob944" seemed to me to be supported by cited facts, and denoted
personal experiences. He's not pointing to something he previously
authored as proof that information is fact. I've only seen him reference
previous posts for the purposes of levelset. To be fair, I haven't read
any of your blog postings, only your posts in this forum. More on that
below. And yes; an "Industry Pundit, Author, SME", or whomever, quoting
"Wikipedia" as a source does tend to dilute credibility, in my mind.
It's not a personal attack, just my personal position on the issue.


The part below has me confused where you say " No, because I never said
those words or anything like them in my article." Since I never
mentioned anything about any articles... All my comments are in regard
you your posts on this forum, in which you did say that. ">Wouldn't THAT
be saying that up until that point, YOU
>WERE SAYING "that no matter what the entire world is saying -- no
matter
>what the numbers are, you're not going to accept..."
This was your text, no?


Obviously there's nothing wrong with admitting you're wrong. What I was
pointing out was that it appears duplicitous to make the comment above
and then state you're probably going to post a retraction in your blog
based one users experience. I'm referring to the 10 GbE thread where one
user reported stellar throughput, which contradicted a contrived
theoretical maximum, and several reports of ho-hum throughput.
" 7500 MB/s!  That's the most impressive numbers I've ever seen by FAR.
I may have to take back my "10 GbE is a Lie!" blog post, and I'd be
happy to do so."
This was your text, no?
So one could easily conclude that a position was taken (and published)
on this topic without sufficient testing or research (the related
SunSolve and other articles were already out there before these posts
were made).


You said: "Remember also that these posts are often done on my own time
late at night, etc.  I never claimed to be perfect."
True, but you do cite that you are an author of books on the subject,
author of a blog on the subject, and work for one of the largest
industry resources. Indeed the " VP Data Protection". You can see how
maybe a newbie might assume a post as gospel with the barrage of
credentials? Would they not be disappointed to learn they need to check
the timestamp of a post before lending any credence to it's contents?
;-)


You said: " I don't think you'll find that to be a problem.  I'm an
in-the-trenches guy, who has sat in front of many a tape drive, tape
library, and backup GUI in my 14 years in this space.  I actually cut my
teeth right down
the road from you as the backup guy at MBNA.  (I lived in Newark, DE,
and you were my bank.)"

I'm not sure what you meant to imply by all this? If tenure with backup
is an issue, than I would suggest you really don't have all that much
time "in this space", relative to my experience anyway. I had been
working with various forms of backup for that long before MBNA even had
a Data Center in DE. Why would it be necessary to point out that you
were in the same geographic locale, or used the services of my employer?
I've never made mention of my employer, or even implied that any of my
statements represented any opinion or position of theirs? I find this
statement, well, bizarre...


Maybe I will attend the class after all. I'm beginning to think I'll be
entertained.

End transmission.

Regards,
Kent Eagle
MTS Infrastructure Engineer II, MCP, MCSE
Tech Services / SMSS


-----Original Message-----
From: Curtis Preston [mailto:cpreston AT glasshouse DOT com] 
Sent: Thursday, October 18, 2007 4:41 PM
To: Eagle, Kent; veritas-bu AT mailman.eng.auburn DOT edu
Cc: bob944 AT attglobal DOT net
Subject: RE: Tapeless backup environments

Glad to have another person in the party.  What's your birthday? ;)

>Are you seriously suggesting that a quote from "Wikipedia" constitutes
>empirical scientific research? 

NO.  He said that I was misusing the Birthday Paradox, and I merely
pointed to the Wikipedia article that uses it the same way.  If you
search on Birthday Paradox on Google, you'll also find a number of other
articles that use the BP in the same way I'm using it, specifically in
regards to hash collisions, as the concept is not new to deduplication.
It has applied to cryptographic uses of hashing for years.

I then went further to explain WHY the BP applies, and I gave a reverse
analogy that I believe completed my argument that the BP applies in this
situation. So..

As to whether or not what I'm doing is empirical scientific research,
It's not.  Empirical research requires testing, observation, and
repeatability.  For the record, I have done repeated testing of many
hash-based dedupe systems using hundreds of backups and restores without
a single hash occurrence of data corruption, but that doesn't address
the question.  IMHO, it's the equivalent of saying a meteor has never
hit my house so meteors must never hit houses.  The discussion is about
the statistical probabilities of a meteor hitting your house, and you
have to do that with math, not empirical scientific research.

>I would be the first to admit that "bob944" has made more than a few
>posts that have "pushed my chair back a couple inches", but at least
>they made me THINK!

And you're saying that my half-a-dozen or so blog postings on the
subject, and none of my responses in this thread don't make you think?
I was fine until I quoted Wikipedia, is that it? ;)

>Is pretty gutsy since you have another post within the past few days
>stating you're ready to RETRACT what you already blogged on this, or
>blogged on that. 

I am admitting that I am not a math or statistics specialist and that I
misunderstood the odds before.  What's wrong with that?  That I was
wrong before, or that I'm stating it publicly that I was wrong before?
I was wrong. I was told I was wrong because I didn't apply the birthday
paradox.  So I applied the Birthday Paradox in the same way I see
everyone else applying it, and the way that makes sense according to the
problem, and the numbers still come out OK.

>Wouldn't THAT be saying that up until that point, YOU
>WERE SAYING "that no matter what the entire world is saying -- no
matter
>what the numbers are, you're not going to accept..."

No, because I never said those words or anything like them in my
article.   I said, "some people say this, but I say that."  Then I even
elicited feedback from the audience.  The point of that portion of the
article was that some are talking about hash collisions as if they're
going to happen to everybody and happen a lot, and I wanted to add some
actual math to the discussion, rather than just talk about fear
uncertainty and doubt (FUD).  I felt there was a little Henny-Penny
business going on.

>If I am asked to restore something for the CEO, and can't, it won't
>matter a hill of beans what all the theory was and what the odds were.
I
>either can, or I can't. I'll be accountable for that result, and why I
>got it. As someone so accurately posted recently: We're in the recovery
>business, not the restore business.

You won't get any argument from me.  I think you'll find almost that
exact sentence in the first few paragraphs of any of my books.  Having
said that, we all use technologies as part of our backup system that
have a failure rate percentage (like tape).  And to the best of my
understanding, the odds of a single hash collision in 95 Exabytes of
data is significantly lower than the odds of having corrupted data on an
LTO tape and not even knowing it, based on the odds they publish.  Even
if you make two copies, the copy could be corrupted, and you could have
a failed restore. Yet we're all ok with that, but we're freaking out
about hash collisions, which statistically speaking have a MUCH lower
probability of happening.

>I would thing that almost everyone on this forum does some kind of
pilot
>before rolling something out into production.

I sure as heck hope so, but I don't think it addresses this issue.  So
you test it and you don't get any hash collisions. What does that prove?
It proves that a meteor has never hit your house.

What I recommend (especially if you're using a hash-only de-dupe system)
is a constant verification of the system.  Use a product like NBU that
can do CRC checks against the bytes it's copying or reading, and either
copy all de-duped data to tape or run a NBU verify on every backup.  If
you have a hash collision, your copy or verify will fail, and at least
know when it happens.

>I hope I'm wrong. 

About what? That I'm an idiot? ;)  I think judging me solely on this
long, protracted, difficult to follow discussion (with over 70 posts) is
probably unfair.  Remember also that these posts are often done on my
own time late at night, etc.  I never claimed to be perfect.

>I love to learn. I'm actually signed up for one of
>your classes next week. But, if quoting everyone else's
>posts/blogs/Wikipedia entries, etc. without backing up re-posting them
>with empirical evidence or firsthand testing is your program agenda, I
>will skip the engagement...

I don't think you'll find that to be a problem.  I'm an in-the-trenches
guy, who has sat in front of many a tape drive, tape library, and backup
GUI in my 14 years in this space.  I actually cut my teeth right down
the road from you as the backup guy at MBNA.  (I lived in Newark, DE,
and you were my bank.)  Don't skip out on the school just because of I
quoted Wikipedia once.

>TW - You "Tilt at Windmills" (Don Quixote), you don't chase them.  ;-)

You are right.  I stand corrected again.  Even Wikipedia backs you up:
http://en.wikipedia.org/wiki/Don_Quixote

(Sorry, just couldn't resist.) ;)

Visit our website at www.wilmingtontrust.com

Investment products are not insured by the FDIC or any other governmental 
agency, are not deposits of or other obligations of or guaranteed by Wilmington 
Trust or any other bank or entity, and are subject to risks, including a 
possible loss of the principal amount invested. This e-mail and any files 
transmitted with it may contain confidential and/or proprietary information.  
It is intended solely for the use of the individual or entity who is the 
intended recipient.  Unauthorized use of this information is prohibited.  If 
you have received this in error, please contact the sender by replying to this 
message and delete this material from any system it may be on.

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu