Veritas-bu

Re: [Veritas-bu] Real time failure notification

2009-02-06 12:17:20
Subject: Re: [Veritas-bu] Real time failure notification
From: "Donaldson, Mark" <Mark.Donaldson AT Staples DOT com>
To: "Travis Kelley" <rhatguy AT gmail DOT com>, "Jeff Lightner" <jlightner AT water DOT com>, <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Fri, 6 Feb 2009 09:59:41 -0700
That's hard to do, too

Say you have a retry interval of once per 2 hours but you have a backup
schedule with an 8 hour window.

You could get four different tries in that time.  If you window is just
4 hours, then, of course, it's half that.  

I'm not sure you get new job numbers or a "try" increment in this case -
I've never really investigated.  It's only a queue thing, too.  It'll
queue if the window is open and the retry period is expired but it won't
necessarily run.  If it doesn't go active, your expected number of
attempts might not be what you're looking for.

The problem you're describing has been a long-time problem with
monitoring Netbackup.  There's a bunch of additional stuff, too.  A
fairly non-critical backup fails, say an incremental on a dev box, it's
not a big deal.  I don't want to be notified for that, it'll try and
probably succeed the next evening.  Now - if it fails more than one day
in a row and am interested in that - there's increasing danger with
increasing failures.

This is what lead us to an "endangered" filesystems report - the first
failure isn't usually a big deal - it's repeated failures we want to
worry about.  Otherwise we'd be buried in the volume of one-off
failures, most of which we wouldn't restart anyway and would just leave
for the next night's cycle.



-----Original Message-----
From: Travis Kelley [mailto:rhatguy AT gmail DOT com] 
Sent: Thursday, February 05, 2009 7:47 AM
To: Jeff Lightner; Donaldson, Mark; veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: Real time failure notification

I guess the question would be better stated as...I want real time
notification of when a configured number of retries have failed.  Its
always in the details:)

Thanks to everyone who has responded.  I'm looking into the options
that have been posted here.  Its great to have such a knowledgable
group of people to bounce questions off of so quickly!

On 2/5/09, Jeff Lightner <jlightner AT water DOT com> wrote:
> Actually it depends on how long your timeout is set.  If you've set
> things to timeout in 2 hours then 6 attempts would take 12 hours.  You
> might want to know long before that.  On the other hand, setting the
> timeout lower risks having the backup abort if it takes a long time
> normally (e.g. a database backup).
>
> The observation wasn't saying what you want won't work but rather that
> it is not truly "real time" which was what you'd put in your original
> post.  You pays your nickel and you takes your choice.
>
> -----Original Message-----
> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
> [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Travis
> Kelley
> Sent: Thursday, February 05, 2009 8:37 AM
> To: Donaldson, Mark; veritas-bu AT mailman.eng.auburn DOT edu
> Subject: Re: [Veritas-bu] Real time failure notification
>
> Not really.  We have netbackup configured to retry a backup if it
> fails so there may be multiple "attempts" under the same jobid.  I
> don't want to get an alert if Netbackup is already running the backup
> again under another attempt.  I only want to get an alert after
> Netbackup has tried the configured number of "attempts" and is failing
> the jobid.  The way I see it is most of the time if something is
> really broken it won't take long to run through the 5 attempts, fail
> the job and alert, but if the box just got to busy and timed out or if
> the backup process was killed unintentionally I'd rather Netbackup
> handle retrying that on its own and not alert me.
>
> On 2/4/09, Donaldson, Mark <Mark.Donaldson AT staples DOT com> wrote:
>> Your "don't alert if retry was successful" automatically excludes the
>> idea of a real-time monitor.
>>
>> It's a bit like saying "Don't alert if you're going to succeed in the
>> future".
>>
>> We "solved" this by creating an after-the-fact monitor for our
backups
> -
>> it searches the bpdbjobs output daily and parses that down to return
>> code, policy, client, & fileset.  If a fileset fails more than X days
> in
>> a row (without a success in there somewhere) then it's reported on as
> an
>> "endangered fileset".
>>
>> It'd been decently effective.
>>
>> -M
>>
>> -----Original Message-----
>> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
>> [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of
Travis
>> Kelley
>> Sent: Wednesday, February 04, 2009 8:36 AM
>> To: veritas-bu AT mailman.eng.auburn DOT edu
>> Subject: [Veritas-bu] Real time failure notification
>>
>> Hi all.  I'm trying to find a solution to a monitoring problem we
>> have.  I would like to create a mechanism to alert when a backup
fails
>> but to only send one alert if multiple streams from a backup fail.
>> For instance if c: and d: both fail for a particular box, I only want
>> 1 alert.  Also if a job fails twice but is successful on the third
>> attempt I don't want an alert at all.  I only want to be alerted once
>> when netbackup "gives up" on retrying a backup and fails the job.
>> I've looked at backup_exit_notify but haven't been able to find a
good
>> way to implement this here.  Any ideas?
>>
>> --
>> Sent from my mobile device
>> _______________________________________________
>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>
>>
>
> --
> Sent from my mobile device
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
> Please consider our environment before printing this e-mail or
attachments.
> ----------------------------------
> CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential
> information and is for the sole use of the intended recipient(s). If
you are
> not the intended recipient, any disclosure, copying, distribution, or
use of
> the contents of this information is prohibited and may be unlawful. If
you
> have received this electronic transmission in error, please reply
> immediately to the sender that you have received the message in error,
and
> delete it. Thank you.
> ----------------------------------
>

-- 
Sent from my mobile device


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu