Hello,
First, I apologize in advance for the simplicity of the question. I don't
administer Netbackup daily. That said, we currently vault daily at 6:00 AM. The
vault processes anywhere between 1.5 and as much as 4 terabytes daily.
If a backup, such as a non-restartable RMAN (Oracle 8.1.7) job fails, our
current process is for the on-call to make sure he/she expires any images
associated with the failed job so they are not included in the next days vault
runs. This is regardless of the time of day or night - it has to occur before
6:00 AM. Otherwise, the results are sometimes disastrous - vault does not
finish in time, consumes additional tape resources, etc.
Would it be reasonable to expect this process could and/or should be automated?
In other words, I see there is a backup_start and a backup_notify set of
scripts that could track the class, schedule, start time, and failure of a job.
Would it be possible to use these, along with bpimagelist -idonly -s start_date
-e end_date, grab the image ID, and expire it?
I suppose the process I'm thinking of would be:
Step 1
Backup starts - backup_start runs and outputs the class, schedule, and start
date/time to a filename - the filename would contain the class and schedule
information. ( > class.schedule.start)
Step 2
Backup ends - failure noted in the backup_notify script based on the status
code.
Step 3
The backup_notify script looks for class.schedule.start in the filename
mentioned in step 1, uses the date/time stamp inside the file with bpimagelist,
and expires the images.
It would be very unlikely the job would be started a second time, in parallel,
causing confusion on which date/time to look at when expiring images. But it
could happen.
Any comments, suggestions, or feedback would be greatly appreciated.
Thank you,
Todd Mermell
UNIX Systems Administrator
todd.mermell AT nordstrom DOT com
206-233-5416
|