Veritas-bu

[Veritas-bu] "Endangered Filesets" report

2005-11-14 18:49:27
Subject: [Veritas-bu] "Endangered Filesets" report
From: Mark.Donaldson AT cexp DOT com (Mark.Donaldson AT cexp DOT com)
Date: Mon, 14 Nov 2005 16:49:27 -0700
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C5E976.0CB2D6E7
Content-Type: text/plain;
        charset="iso-8859-1"

Earlier I posted that I intended to try to create a report for failed
backups based on the INCLUDE lists in the policies.  The intention was to
report on filesets, drives, etc. within a policy rather than just noting
that a particular client had successive failures.

The problem, for me, in simply reporting that client XYZ failed three times
in a row is that it may not be a complete failure of the client.  If the C:
drive fails one night, the D: drive the next, and the E: drive the third
night, then I'm not really concerned.  I'm more concerned if the C: drive
fails Monday & Tuesday & Wednesday because there's a serious exposure in the
event a restore is required.

So the new report is an an "Endangered Filesets" report.

This should report on the list of filesets within policies that haven't had
a successful backup in over three days (configurable).  This is based on the
policy's INCLUDE sets, expanded keywords like ALL_LOCAL_DRIVES, or the
automatically generated sets within a multi-stream backup using file-regex
patterns.

So, for example, one of our problem NT clients has frequent failures of its
D: drive backup because its network connection is slow & buggy & the D:
drive is large.  If a backup of D:, full or incremental, fails for more than
three days in a row, it should report.  Success or failure of the C: drive
on this server is reported on separately.

Error code 0 & 1 are successes in this report, all others are failures.
This is only for Standard & NT policies right now.

Filesets that aren't scheduled won't report - it must try and then fail to
be reported on.  In my experience, not scheduling isn't usually a problem,
though.  

The report should only arrive when there's something to report - no news is
good news for this one.  I'm running mine once a day via cron.

Variables at the top for mail address & the number of days to report upon
should be changed to your local values & whims.  The script depends on
"gawk" for the parsing of the large bpdbjobs output and a utility called
"seconds_since_epoch" that reports the current time in epoch format.

You can make your own or steal this:

> cat seconds_since_epoch
#!/usr/bin/perl
$t = time( );
print $t . "\n";

HTH -M


------_=_NextPart_000_01C5E976.0CB2D6E7
Content-Type: application/octet-stream;
        name="endangered_check"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
        filename="endangered_check"

#!/usr/bin/ksh=0A=
=0A=
#=0A=
# This utilty searches the bpdbjobs output for filesets, parsing=0A=
# down to return code, client, policy, & fileset.  It then sends=0A=
# a warning in the event a fileset fails more than $DAYSBACK number=0A=
# of days in a row.=0A=
#=0A=
# Mark Donaldson - CExp - Nov 14, 2005=0A=
#=0A=
=0A=
#Search Depth for backups=0A=
DAYSBACK=3D3=0A=
=0A=
#Mail alias for report=0A=
ADDR=3DNetbackup.Errors=0A=
=0A=
#Path to bpdbjobs & seconds_since_epoch=0A=
PATH=3D$PATH:/usr/openv/netbackup/bin/admincmd:/usr/openv/local/bin=0A=
=0A=
TMP1=3D/tmp/`basename $0`.$$.1.tmp=0A=
TMP2=3D/tmp/`basename $0`.$$.2.tmp=0A=
LOG=3D/usr/openv/netbackup/logs/scripts/`basename $0`.log=0A=
=0A=
trap "[ -f $TMP1 ] && rm -f $TMP1 =0A=
      [ -f $TMP2 ] && rm -f $TMP2 =0A=
      exit " 0 1 2 3 4 5 6 7 8 10 11 12 13 14 15 =0A=
      =0A=
exec >$TMP2 2>&1=0A=
=0A=
now=3D`seconds_since_epoch`=0A=
secs=3D`expr $DAYSBACK \* 86400`=0A=
past=3D`expr $now - $secs`=0A=
=0A=
bpdbjobs -all_columns | sed 's/connecting.*$//'| \=0A=
   gawk -F',' '{if($2=3D=3D0 && $3=3D=3D3 && $11>=3D'$past' && \=0A=
                $15!=3D"" && ( $22=3D=3D0 || $22=3D=3D13 )){ =0A=
                count=3D0;string=3D$33;numfiles=3D$32=0A=
                if (numfiles>1){while (count<(numfiles-1)){count++=0A=
                 new=3D$(33+count)=0A=
                 if(new!=3D"") {string=3Dstring "," $(33+count)}}}=0A=
           print $4 ";" $5 ";" $7 ";" string}}'|sed 's,:\\\\,:\\,g' =
>$TMP1=0A=
=0A=
sort -t';' -k3,4 -k2 -o $TMP1 $TMP1=0A=
=0A=
echo "## Search Depth: $DAYSBACK days."=0A=
echo "## `wc -l $TMP1 | awk '{print $1}'` jobs being scanned...\n"=0A=
=0A=
awk -F';' =
'{if(NR=3D=3D1){client=3D$3;fsys=3D$4;pnow=3D$2;plist=3Dpnow=0A=
                     if($1 <=3D 1) {foundok=3D1} else {foundok=3D0} =
}=0A=
           else { #Client or Fileset changed-time to check success=0A=
               if (client !=3D $3 || fsys !=3D $4) {=0A=
                 if (foundok=3D=3D0) { print "No good backup for:" fsys =
" on " client " using " plist}=0A=
                 client=3D$3;fsys=3D$4;pnow=3D$2;plist=3Dpnow           =
=0A=
                 if($1 <=3D 1){foundok=3D1} else {foundok=3D0} }=0A=
               else { #Client & filesystem the same=0A=
                 if ( $1 <=3D 1 ) {foundok=3D1}=0A=
                 if ( pnow !=3D $2 ) {plist=3Dplist "," $2 ; pnow=3D$2} =
} } }=0A=
           END { if (foundok=3D=3D0) { print "No good backup for:" fsys =
" on " client " using " plist} }' $TMP1=0A=
=0A=
echo "\n## Report Done: `date`"=0A=
=0A=
# LOG mail & management=0A=
if [ -f $TMP2 ]=0A=
then=0A=
  if [ `egrep -cv "^##|^ *$" $TMP2` -gt 0 ]=0A=
  then=0A=
    mailx -s "NB Wrn: Endangered Filesets Report" $ADDR <$TMP2=0A=
    [ -f ${LOG}.4 ] && mv ${LOG}.4 ${LOG}.5=0A=
    [ -f ${LOG}.3 ] && mv ${LOG}.3 ${LOG}.4=0A=
    [ -f ${LOG}.2 ] && mv ${LOG}.2 ${LOG}.3=0A=
    [ -f ${LOG}.1 ] && mv ${LOG}.1 ${LOG}.2=0A=
    [ -f ${LOG} ]   && mv ${LOG}   ${LOG}.1=0A=
    mv $TMP2 $LOG=0A=
  fi=0A=
fi=0A=

------_=_NextPart_000_01C5E976.0CB2D6E7--

<Prev in Thread] Current Thread [Next in Thread>