Veritas-bu

Re: [Veritas-bu] Number of retries query

2009-04-07 09:50:05
Subject: Re: [Veritas-bu] Number of retries query
From: Dave Markham <dave.markham AT fjserv DOT net>
To: ken_zufall AT goodyear DOT com, "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Tue, 07 Apr 2009 14:46:36 +0100
....and if its use for anyone else here is what i shall implement :-

Prev_Job=`bperror -backstat -client <oracle mgmt client> -hoursago 12 | 
awk '$14 == "<policy>" { print "Client ["$12"], STATUS ["$19"]" }'`
if [ "$Prev_Job" ];then
    echo "ERROR: A previous job has ran in the past [$hours] hours" >> $log
    echo "$Prev_Job" >> $log
    exit 1
fi


ken_zufall AT goodyear DOT com wrote:
>
> Wow, glad I don't have your job...that's pretty convoluted :P
>
> But may have an answer, building off the lock file idea...but much 
> simpler.  Just put the logic in the bpstart to check and see if the 
> policy you're executing has run in the past X hours and failed...if it 
> has, exit gracefully, if it hasn't, continue the backup.  
>
> Quick and dirty logic:
>
> bperror -backstat -hoursago [hours] -l | awk '{print $19,$14}' | grep 
> -v "^0" | grep [policy_name]
>
> In the above, $19 = backup status code, $14 = policy name.  Strip out 
> any successful backups, grep for the policy name...if it's not null, 
> you've had a failure in the past X hours.
>
> Of course, there are different ways to parse the bperror output, but 
> the above would work.  In fact, you shouldn't even have to grep out 
> successes because the process shouldn't be trying to submit the policy 
> if it's run successfully.
>
> Ken Zufall
> Technical Analyst
> D660C
> The Goodyear Tire & Rubber Company
> GTN 446.0592 or 330.796.0592
>
>
>
> *Dave Markham <dave.markham AT fjserv DOT net>*
>
> 04/07/2009 06:16 AM
> Please respond to
> dave.markham AT fjserv DOT net
>
>
>       
> To
>       ken_zufall AT goodyear DOT com
> cc
>       "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT 
> mailman.eng.auburn DOT edu>
> Subject
>       Re: [Veritas-bu] Number of retries query
>
>
>
>       
>
>
>
>
>
> Thanks guys there are some useful options there.
>
> To give more info we run the RMAN job as follows :-
>
> -We have an oracle admin station which holds various oracle dba scripts.
> -We have a policy which controls the scheduling and kicks of a client
> backup of this oracle management station and backs up a single file to a
> disk storage unit on the master. (simple directory). We backup one file
> to stop any status 71
> -The reason the policy and schedule is there is to run a bpstart script
> on the management station.
> -This bpstart script checks no oracle tape dba script is already running
> (if it is it exits non zero and obviously gives status 73 in netbackup)
> -Once the checks are passed it launches an oracle dba script (not
> maintained by me).
> -This oracle script talks to 3 oracle RAC servers and works out which
> one is running the particular db instance.
> -These oracle RAC servers are all Netbackup media servers and they then
> initiate the oracle backup through a Netbackup oracle agent on the
> relevant media server. This backs up using the application schedules on
> the master server for the associated policy with each media server.
> (sorry that sounds confusing).
> -If the oracle script fails and exits with non zero then in turn our
> bpstart script fails with status 73 and we can alert the dbas
>
> We want to launch via netbackup this way so we can trap the exit status
> and report to the dbas there has been a problem, plus for it to appear
> on a daily report.
>
> The case we have experienced is if a backup fails which could be due to
> no tapes or various oracle failures, the dba's don't want an automatic
> one running again as it starts doing things with flash recovery areas
> and starts running into the normal working day.
>
> Indeed perhaps some logic in the bpstart script to create a lockfile is
> useful, but the lock file would need to be removed upon completion or
> failure and this would then not give us any benefit when try 2 happens.
>
> If a lock file was used we could do some date matching and perhaps only
> run a job if the lockfile was older than x hours ( a lot of date parsing
> though which could be difficult ) to touch it again and run the backup.
> I'll have to explorer this method.
>
> Cheers
>
>
>
>
>
> ken_zufall AT goodyear DOT com wrote:
> >
> > Dave,
> >
> > This isn't an ideal fix, but it will work--schedule the backups from
> > the client.  Basically, just put entries in cron (root or oracle will
> > work) with the commands (or script wrapper around the command) to
> > launch the backup instead of using the NBU scheduler (will have to
> > remove current full/incremental schedules and replace with a user
> > directed that has the appropriate windows).  Reason this will work is
> > because the automatic retries only affects backups launched from the
> > master...if it's submitted by the client, it will not retry on failure.
> >
> > Only real issues off the top of my head are:
> >
> > 1) If client is down or doesn't have network connectivity, you won't
> > see failure to run backup in NBU because the backup will never be
> > submitted.
> >
> > 2) You lose visibility to backup schedules within NBU.
> >
> > Ken Zufall
> > Technical Analyst
> > D660C
> > The Goodyear Tire & Rubber Company
> > GTN 446.0592 or 330.796.0592
> >
> >
> >
> > *Len Boyle <Len.Boyle AT sas DOT com>*
> > Sent by: veritas-bu-bounces AT mailman.eng.auburn DOT edu
> >
> > 04/06/2009 09:30 AM
> >
> >                  
> > To
> >                  "dave.markham AT fjserv DOT net" <dave.markham AT fjserv 
> > DOT net>,
> > "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT 
> > mailman.eng.auburn DOT edu>
> > cc
> >                  
> > Subject
> >                  Re: [Veritas-bu] Number of retries query
> >
> >
> >
> >                  
> >
> >
> >
> >
> >
> > Good Morning Dave,
> >
> > I know of no way to change the number of job retries on a policy or
> > client or schedule object.
> > I can see where this would be a nice feature to have.
> >
> > There are many different reasons that a rman backup job can fail.
> >
> > >From a netbackup end of things one could have a 96 error no scratch
> > tapes,
> > A media fault, A network issue. Etc.
> > Or it could be a oracle issue.
> >
> > For something like a media issue that is cleared up on the netbackup
> > end of things I would think that the dba's would want the backup to be
> > retried. For an oracle issue I do not know enough.
> >
> > But either way I believe that you could add the control you require
> > into the script that netbackup runs on the client to run the rman
> > commands. Might not be easy.
> >
> > I am sure other that know oracle can give you a better answer then
> > this, and I look forward to learning.
> > As a simple case of go or nogo without any variance based on the prior
> > failure you could try.
> > In the beginning of the script you could set a state value of
> > "STARTED" into a file on client. At the end of the script the vaule
> > could be changed to "COMPLETE".
> > At the start of the script if the value is not "COMPLETE" the script
> > could give an error return and exit. Someone would have to change the
> > statue value to "STARTED" to enable the script to run. This could be
> > done after clearing the problem. This can also be used to bypass the
> > running of the backup at the script level when the   oracle dba's are
> > doing maintenance work on the oracle database. If you use and check
> > for some state value of "BYPASS" then the script could exit with a
> > normal return code and netbackup would not have a backup but would
> > think that everything is ok and not retry.
> > You  could also use touch files instead on one state file.
> >
> > Let us know what you end of doing to solve this issue.
> >
> > len
> >
> > -----Original Message-----
> > From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
> > [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Dave
> > Markham
> > Sent: Monday, April 06, 2009 8:17 AM
> > To: veritas-bu AT mailman.eng.auburn DOT edu
> > Subject: [Veritas-bu] Number of retries query
> >
> > Guys does anyone know if you can change the number of job retries in xx
> > time period on a per client basis?
> >
> > I currently have the global set at 2 tries per 12 hours which is fine
> > for our needs and good in the fact it will try a failed backup.
> >
> > However the DBA for an RMAN and oracle policy doesn't want this to
> > happen and re-run a backup if there is a failure so i need to try and
> > find a way of setting it to 1 try for just one client.
> >
> > Any ideas?
> >
> > Cheers
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >  
>
>

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu