Veritas-bu

Re: [Veritas-bu] Number of retries query

2009-04-07 06:20:37
Subject: Re: [Veritas-bu] Number of retries query
From: Dave Markham <dave.markham AT fjserv DOT net>
To: ken_zufall AT goodyear DOT com
Date: Tue, 07 Apr 2009 11:16:25 +0100
Thanks guys there are some useful options there.

To give more info we run the RMAN job as follows :-

-We have an oracle admin station which holds various oracle dba scripts.
-We have a policy which controls the scheduling and kicks of a client 
backup of this oracle management station and backs up a single file to a 
disk storage unit on the master. (simple directory). We backup one file 
to stop any status 71
-The reason the policy and schedule is there is to run a bpstart script 
on the management station.
-This bpstart script checks no oracle tape dba script is already running 
(if it is it exits non zero and obviously gives status 73 in netbackup)
-Once the checks are passed it launches an oracle dba script (not 
maintained by me).
-This oracle script talks to 3 oracle RAC servers and works out which 
one is running the particular db instance.
-These oracle RAC servers are all Netbackup media servers and they then 
initiate the oracle backup through a Netbackup oracle agent on the 
relevant media server. This backs up using the application schedules on 
the master server for the associated policy with each media server. 
(sorry that sounds confusing).
-If the oracle script fails and exits with non zero then in turn our 
bpstart script fails with status 73 and we can alert the dbas

We want to launch via netbackup this way so we can trap the exit status 
and report to the dbas there has been a problem, plus for it to appear 
on a daily report.

The case we have experienced is if a backup fails which could be due to 
no tapes or various oracle failures, the dba's don't want an automatic 
one running again as it starts doing things with flash recovery areas 
and starts running into the normal working day.

Indeed perhaps some logic in the bpstart script to create a lockfile is 
useful, but the lock file would need to be removed upon completion or 
failure and this would then not give us any benefit when try 2 happens.

If a lock file was used we could do some date matching and perhaps only 
run a job if the lockfile was older than x hours ( a lot of date parsing 
though which could be difficult ) to touch it again and run the backup. 
I'll have to explorer this method.

Cheers





ken_zufall AT goodyear DOT com wrote:
>
> Dave,
>
> This isn't an ideal fix, but it will work--schedule the backups from 
> the client.  Basically, just put entries in cron (root or oracle will 
> work) with the commands (or script wrapper around the command) to 
> launch the backup instead of using the NBU scheduler (will have to 
> remove current full/incremental schedules and replace with a user 
> directed that has the appropriate windows).  Reason this will work is 
> because the automatic retries only affects backups launched from the 
> master...if it's submitted by the client, it will not retry on failure.
>
> Only real issues off the top of my head are:
>
> 1) If client is down or doesn't have network connectivity, you won't 
> see failure to run backup in NBU because the backup will never be 
> submitted.
>
> 2) You lose visibility to backup schedules within NBU.
>
> Ken Zufall
> Technical Analyst
> D660C
> The Goodyear Tire & Rubber Company
> GTN 446.0592 or 330.796.0592
>
>
>
> *Len Boyle <Len.Boyle AT sas DOT com>*
> Sent by: veritas-bu-bounces AT mailman.eng.auburn DOT edu
>
> 04/06/2009 09:30 AM
>
>       
> To
>       "dave.markham AT fjserv DOT net" <dave.markham AT fjserv DOT net>, 
> "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT mailman.eng.auburn 
> DOT edu>
> cc
>       
> Subject
>       Re: [Veritas-bu] Number of retries query
>
>
>
>       
>
>
>
>
>
> Good Morning Dave,
>
> I know of no way to change the number of job retries on a policy or 
> client or schedule object.
> I can see where this would be a nice feature to have.
>
> There are many different reasons that a rman backup job can fail.
>
> >From a netbackup end of things one could have a 96 error no scratch 
> tapes,
> A media fault, A network issue. Etc.
> Or it could be a oracle issue.
>
> For something like a media issue that is cleared up on the netbackup 
> end of things I would think that the dba's would want the backup to be 
> retried. For an oracle issue I do not know enough.
>
> But either way I believe that you could add the control you require 
> into the script that netbackup runs on the client to run the rman 
> commands. Might not be easy.
>
> I am sure other that know oracle can give you a better answer then 
> this, and I look forward to learning.
> As a simple case of go or nogo without any variance based on the prior 
> failure you could try.
> In the beginning of the script you could set a state value of 
> "STARTED" into a file on client. At the end of the script the vaule 
> could be changed to "COMPLETE".
> At the start of the script if the value is not "COMPLETE" the script 
> could give an error return and exit. Someone would have to change the 
> statue value to "STARTED" to enable the script to run. This could be 
> done after clearing the problem. This can also be used to bypass the 
> running of the backup at the script level when the   oracle dba's are 
> doing maintenance work on the oracle database. If you use and check 
> for some state value of "BYPASS" then the script could exit with a 
> normal return code and netbackup would not have a backup but would 
> think that everything is ok and not retry.
> You  could also use touch files instead on one state file.
>
> Let us know what you end of doing to solve this issue.
>
> len
>
> -----Original Message-----
> From: veritas-bu-bounces AT mailman.eng.auburn DOT edu 
> [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Dave 
> Markham
> Sent: Monday, April 06, 2009 8:17 AM
> To: veritas-bu AT mailman.eng.auburn DOT edu
> Subject: [Veritas-bu] Number of retries query
>
> Guys does anyone know if you can change the number of job retries in xx
> time period on a per client basis?
>
> I currently have the global set at 2 tries per 12 hours which is fine
> for our needs and good in the fact it will try a failed backup.
>
> However the DBA for an RMAN and oracle policy doesn't want this to
> happen and re-run a backup if there is a failure so i need to try and
> find a way of setting it to 1 try for just one client.
>
> Any ideas?
>
> Cheers
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>   

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu