[Veritas-bu] "Endangered Filesets" report

This script should do that.

I parses down to filesystem level to take care of multistreaming problem.
It doesn't care when you succeeded, any success for that fileset & client
will do.  Of course, fail once, try again & succeed doesn't need to try
again so the success tends to be the last in the attempt stream.

If you set the search depth to one-day, it would, I think, just about do
what you describe below.

-M

-----Original Message-----
From: Bob Stump [mailto:stumpb AT michigan DOT gov]
Sent: Tuesday, November 15, 2005 7:14 AM
To: Mark.Donaldson AT cexp DOT com
Subject: Re: [Veritas-bu] "Endangered Filesets" report


Mark,
Thank you for posting this script.
I think I can modify it to meet my needs for the following condition

I am looking for a script to run on a UNIX master to report failed jobs,
HOWEVER***

I do not want to include in the report any job that eventually finished
successfully within its defined window. 
for instance
window = 12 hours
retries = 2 times evry 4 hours
job failed 2 times in first 4hours
job failed 1 time in second 4 hours and then was successful on the second
attempt in the second 4 hours

three failed jobs BUT the job was successful in the window so I do not want
it included in the failed jobs report. This gets especially tricky with
multistreaming! 


I posted the above problem on the Veritas Architect Network:
http://forums.veritas.com/discussions/thread.jspa?threadID=54575&tstart=0




>>> <Mark.Donaldson AT cexp DOT com> 11/14/2005 6:49 PM >>>
Earlier I posted that I intended to try to create a report for failed
backups based on the INCLUDE lists in the policies.  The intention was to
report on filesets, drives, etc. within a policy rather than just noting
that a particular client had successive failures.

The problem, for me, in simply reporting that client XYZ failed three times
in a row is that it may not be a complete failure of the client.  If the C:
drive fails one night, the D: drive the next, and the E: drive the third
night, then I'm not really concerned.  I'm more concerned if the C: drive
fails Monday & Tuesday & Wednesday because there's a serious exposure in the
event a restore is required.

So the new report is an an "Endangered Filesets" report.

This should report on the list of filesets within policies that haven't had
a successful backup in over three days (configurable).  This is based on the
policy's INCLUDE sets, expanded keywords like ALL_LOCAL_DRIVES, or the
automatically generated sets within a multi-stream backup using file-regex
patterns.

So, for example, one of our problem NT clients has frequent failures of its
D: drive backup because its network connection is slow & buggy & the D:
drive is large.  If a backup of D:, full or incremental, fails for more than
three days in a row, it should report.  Success or failure of the C: drive
on this server is reported on separately.

Error code 0 & 1 are successes in this report, all others are failures.
This is only for Standard & NT policies right now.

Filesets that aren't scheduled won't report - it must try and then fail to
be reported on.  In my experience, not scheduling isn't usually a problem,
though.  

The report should only arrive when there's something to report - no news is
good news for this one.  I'm running mine once a day via cron.

Variables at the top for mail address & the number of days to report upon
should be changed to your local values & whims.  The script depends on
"gawk" for the parsing of the large bpdbjobs output and a utility called
"seconds_since_epoch" that reports the current time in epoch format.

You can make your own or steal this:

> cat seconds_since_epoch
#!/usr/bin/perl
$t = time( );
print $t . "\n";

HTH -M