Re: amverify - reality check?


Jon LaBadie wrote:
> On Tue, May 08, 2007 at 10:10:15AM -0400, Chris Hoogendyk wrote:
>   
>> Jon LaBadie wrote:
>>     
>>> The second part can only be done by actually doing restores.
>>> Perhaps you could schedule periodic recoveries of files
>>> or directory trees.  Do some sort of varying selection of
>>> clients, tapes, and data to recover.  Maybe even a regular
>>> "the chips are down" disaster exercise.
>>>       
>> I would absolutely agree with Jon.
>>
>> You simply cannot be "sure" or "guarantee," but you can attain a level
>> of confidence -- statistical sampling and testing if you want to get
>> formal about it. After installing a new backup system, the first thing
>> after backup should be to test recovery. Then, periodically pull a tape
>> at random and test recovery. Experience and confidence are common terms,
>> but you can also estimate probabilities of future success or failure
>> based on the data if you really want to dig into it.
>>
>>     
>
> One thing I dislike about "random" sampling is the possibility of never
> testing certain combinations.  I think the statistical approach would
> give me more confidence, particularly if all combinations were regularly
> tested in a reasonable time frame.  Of course your reasonable time frame
> might seem excessive to me ;)
>   

Statistical sampling can be done in many ways, designed to cover
different situations and starting with the definition of the population
to be sampled (including sub populations). Simple Random Sampling (srs)
is what the lay person thinks of as "random" sampling. However, if you
crack a book on Statistical Sampling, you will find many many chapters
on different models and approaches to sampling. That's why I suggested
he visit the stat lab. Used to be that students, faculty and staff could
walk in there and get help. I haven't been there since 1981, so it may
have changed.

Sampling Design works backwards from either partial data or assumptions
about the population and allows you to determine what sample size or
frequency you need to attain a certain level of confidence or precision
of estimation.

What you may end up doing is simply more precisely estimating your
failure rate. But, then you could use that information to augment your
backup procedures, if you thought your failure rate was higher than you
were willing to accept. If you take this idea and put it on a time
sequence going forward, then it becomes a sort of early warning system.
When the current estimate of failure rate reaches some critical
threshold, it's time to ... replace tapes, replace some hardware, figure
out what the problem is, ... or whatever.

I suppose anyone who had a large enough installation and felt the need
to take it to that much depth could also afford to hire a statistical
consultant. Others of us have to fly by intuition.


>> The other side of this is your own personal experience and confidence.
>> When "the chips are down", you can say, "Ah, I've done that a bunch of
>> times. I'm confident I can do it now."
>>
>> You need both of those in the common sense.
>>     
>
> The confidence and experience aspect is a great point.  And if your
> backup system is worth 12K as was stated, then there are probably
> multiple people who need to gain that experience and confidence.
> Not just the one with primary backup responsibility who invariably
> happens to be on vacation just when "the chips are down".
>   

Hey, if he doesn't want to share the joy, he can always carry a beeper
on vacation. ;-)


---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 

<hoogendyk AT bio.umass DOT edu>

--------------- 

Erdös 4