Veritas-bu

Re: [Veritas-bu] Number of retries query

2009-04-07 09:51:42
Subject: Re: [Veritas-bu] Number of retries query
From: Dave Markham <dave.markham AT fjserv DOT net>
To: dave.markham AT fjserv DOT net
Date: Tue, 07 Apr 2009 14:48:17 +0100
Dave Markham wrote:

Sorry correction :-

hours=12
Prev_Job=`bperror -backstat -client <oracle mgmt client> -hoursago $hours | 
awk '$14 == "<policy>" { print "Client ["$12"], STATUS ["$19"]" }'`
if [ "$Prev_Job" ];then
    echo "ERROR: A previous job has ran in the past [$hours] hours" >> $log
    echo "$Prev_Job" >> $log
    exit 1
fi



> ....and if its use for anyone else here is what i shall implement :-
>
> Prev_Job=`bperror -backstat -client <oracle mgmt client> -hoursago 12 | 
> awk '$14 == "<policy>" { print "Client ["$12"], STATUS ["$19"]" }'`
> if [ "$Prev_Job" ];then
>     echo "ERROR: A previous job has ran in the past [$hours] hours" >> $log
>     echo "$Prev_Job" >> $log
>     exit 1
> fi
>
>
> ken_zufall AT goodyear DOT com wrote:
>   
>> Wow, glad I don't have your job...that's pretty convoluted :P
>>
>> But may have an answer, building off the lock file idea...but much 
>> simpler.  Just put the logic in the bpstart to check and see if the 
>> policy you're executing has run in the past X hours and failed...if it 
>> has, exit gracefully, if it hasn't, continue the backup.  
>>
>> Quick and dirty logic:
>>
>> bperror -backstat -hoursago [hours] -l | awk '{print $19,$14}' | grep 
>> -v "^0" | grep [policy_name]
>>
>> In the above, $19 = backup status code, $14 = policy name.  Strip out 
>> any successful backups, grep for the policy name...if it's not null, 
>> you've had a failure in the past X hours.
>>
>> Of course, there are different ways to parse the bperror output, but 
>> the above would work.  In fact, you shouldn't even have to grep out 
>> successes because the process shouldn't be trying to submit the policy 
>> if it's run successfully.
>>
>> Ken Zufall
>> Technical Analyst
>> D660C
>> The Goodyear Tire & Rubber Company
>> GTN 446.0592 or 330.796.0592
>>
>>
>>
>> *Dave Markham <dave.markham AT fjserv DOT net>*
>>
>> 04/07/2009 06:16 AM
>> Please respond to
>> dave.markham AT fjserv DOT net
>>
>>
>>      
>> To
>>      ken_zufall AT goodyear DOT com
>> cc
>>      "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT 
>> mailman.eng.auburn DOT edu>
>> Subject
>>      Re: [Veritas-bu] Number of retries query
>>
>>
>>
>>      
>>
>>
>>
>>
>>
>> Thanks guys there are some useful options there.
>>
>> To give more info we run the RMAN job as follows :-
>>
>> -We have an oracle admin station which holds various oracle dba scripts.
>> -We have a policy which controls the scheduling and kicks of a client
>> backup of this oracle management station and backs up a single file to a
>> disk storage unit on the master. (simple directory). We backup one file
>> to stop any status 71
>> -The reason the policy and schedule is there is to run a bpstart script
>> on the management station.
>> -This bpstart script checks no oracle tape dba script is already running
>> (if it is it exits non zero and obviously gives status 73 in netbackup)
>> -Once the checks are passed it launches an oracle dba script (not
>> maintained by me).
>> -This oracle script talks to 3 oracle RAC servers and works out which
>> one is running the particular db instance.
>> -These oracle RAC servers are all Netbackup media servers and they then
>> initiate the oracle backup through a Netbackup oracle agent on the
>> relevant media server. This backs up using the application schedules on
>> the master server for the associated policy with each media server.
>> (sorry that sounds confusing).
>> -If the oracle script fails and exits with non zero then in turn our
>> bpstart script fails with status 73 and we can alert the dbas
>>
>> We want to launch via netbackup this way so we can trap the exit status
>> and report to the dbas there has been a problem, plus for it to appear
>> on a daily report.
>>
>> The case we have experienced is if a backup fails which could be due to
>> no tapes or various oracle failures, the dba's don't want an automatic
>> one running again as it starts doing things with flash recovery areas
>> and starts running into the normal working day.
>>
>> Indeed perhaps some logic in the bpstart script to create a lockfile is
>> useful, but the lock file would need to be removed upon completion or
>> failure and this would then not give us any benefit when try 2 happens.
>>
>> If a lock file was used we could do some date matching and perhaps only
>> run a job if the lockfile was older than x hours ( a lot of date parsing
>> though which could be difficult ) to touch it again and run the backup.
>> I'll have to explorer this method.
>>
>> Cheers
>>
>>
>>
>>
>>
>> ken_zufall AT goodyear DOT com wrote:
>>     
>>> Dave,
>>>
>>> This isn't an ideal fix, but it will work--schedule the backups from
>>> the client.  Basically, just put entries in cron (root or oracle will
>>> work) with the commands (or script wrapper around the command) to
>>> launch the backup instead of using the NBU scheduler (will have to
>>> remove current full/incremental schedules and replace with a user
>>> directed that has the appropriate windows).  Reason this will work is
>>> because the automatic retries only affects backups launched from the
>>> master...if it's submitted by the client, it will not retry on failure.
>>>
>>> Only real issues off the top of my head are:
>>>
>>> 1) If client is down or doesn't have network connectivity, you won't
>>> see failure to run backup in NBU because the backup will never be
>>> submitted.
>>>
>>> 2) You lose visibility to backup schedules within NBU.
>>>
>>> Ken Zufall
>>> Technical Analyst
>>> D660C
>>> The Goodyear Tire & Rubber Company
>>> GTN 446.0592 or 330.796.0592
>>>
>>>
>>>
>>> *Len Boyle <Len.Boyle AT sas DOT com>*
>>> Sent by: veritas-bu-bounces AT mailman.eng.auburn DOT edu
>>>
>>> 04/06/2009 09:30 AM
>>>
>>>                  
>>> To
>>>                  "dave.markham AT fjserv DOT net" <dave.markham AT fjserv 
>>> DOT net>,
>>> "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT 
>>> mailman.eng.auburn DOT edu>
>>> cc
>>>                  
>>> Subject
>>>                  Re: [Veritas-bu] Number of retries query
>>>
>>>
>>>
>>>                  
>>>
>>>
>>>
>>>
>>>
>>> Good Morning Dave,
>>>
>>> I know of no way to change the number of job retries on a policy or
>>> client or schedule object.
>>> I can see where this would be a nice feature to have.
>>>
>>> There are many different reasons that a rman backup job can fail.
>>>
>>> >From a netbackup end of things one could have a 96 error no scratch
>>> tapes,
>>> A media fault, A network issue. Etc.
>>> Or it could be a oracle issue.
>>>
>>> For something like a media issue that is cleared up on the netbackup
>>> end of things I would think that the dba's would want the backup to be
>>> retried. For an oracle issue I do not know enough.
>>>
>>> But either way I believe that you could add the control you require
>>> into the script that netbackup runs on the client to run the rman
>>> commands. Might not be easy.
>>>
>>> I am sure other that know oracle can give you a better answer then
>>> this, and I look forward to learning.
>>> As a simple case of go or nogo without any variance based on the prior
>>> failure you could try.
>>> In the beginning of the script you could set a state value of
>>> "STARTED" into a file on client. At the end of the script the vaule
>>> could be changed to "COMPLETE".
>>> At the start of the script if the value is not "COMPLETE" the script
>>> could give an error return and exit. Someone would have to change the
>>> statue value to "STARTED" to enable the script to run. This could be
>>> done after clearing the problem. This can also be used to bypass the
>>> running of the backup at the script level when the   oracle dba's are
>>> doing maintenance work on the oracle database. If you use and check
>>> for some state value of "BYPASS" then the script could exit with a
>>> normal return code and netbackup would not have a backup but would
>>> think that everything is ok and not retry.
>>> You  could also use touch files instead on one state file.
>>>
>>> Let us know what you end of doing to solve this issue.
>>>
>>> len
>>>
>>> -----Original Message-----
>>> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
>>> [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Dave
>>> Markham
>>> Sent: Monday, April 06, 2009 8:17 AM
>>> To: veritas-bu AT mailman.eng.auburn DOT edu
>>> Subject: [Veritas-bu] Number of retries query
>>>
>>> Guys does anyone know if you can change the number of job retries in xx
>>> time period on a per client basis?
>>>
>>> I currently have the global set at 2 tries per 12 hours which is fine
>>> for our needs and good in the fact it will try a failed backup.
>>>
>>> However the DBA for an RMAN and oracle policy doesn't want this to
>>> happen and re-run a backup if there is a failure so i need to try and
>>> find a way of setting it to 1 try for just one client.
>>>
>>> Any ideas?
>>>
>>> Cheers
>>> _______________________________________________
>>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>>
>>> _______________________________________________
>>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>>  
>>>       
>>     
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
>
>   

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>