Veritas-bu

[Veritas-bu] Checking to see if a backup ran when expected.

2004-09-01 18:44:09
Subject: [Veritas-bu] Checking to see if a backup ran when expected.
From: Mark.Donaldson AT cexp DOT com (Mark.Donaldson AT cexp DOT com)
Date: Wed, 1 Sep 2004 16:44:09 -0600
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C49075.3210C630
Content-Type: text/plain;
        charset="iso-8859-1"

Thought somebody might find this useful.

It's been posted to this list a lot that it'd be desirable to have a tool
that did more than detect if a backup failed but also if a backup simply
didn't run when expected.

I found myself in the need for this kind of information and so put this
script together over the past week.  My DBA's want to know more then just
"did the backup fail or succeed?" but also the answer to the question, "did
the backup fail to start?".

At first, I tried to generate a list of "expected future backups" using the
-predict function of bpsched but that turned out to be unwieldy.  That
command doesn't reply on a range of time, just reports what backup windows
are open at a specific point in time.  

This new script uses the windows & frequency settings in the policy and
looks back over time to see if a backup occurred within the expected
frequency from the close of the last backup window.

It only works for frequency-based schedules.  Exclude dates on a freq
schedule are ignored and pure-calendar schedules should also be ignored - if
I wrote this right, anyway.  Sorry - but implementing calendar schedules was
too complex and I don't use them anyway.

It starts with a list of active policies, builds a denormalized "database"
from them, then scans back to find if there's any missing backups within the
expected frequency-based windows.  If the backup window is in the future,
(like a 16:00 start and it's only 12:00) then the previous open window is
used.  If the window start is in the past but the duration says the window
is still open (like at 16:00 start with a 4 hour duration and it's 18:00),
then the previous window should be used again. However, in this second case,
a backup that succeeded in the current window will act to "pass" the check.

If a single-stream of a multi-stream backup failed, then this won't pick it
up, either.  It's simply pass/fail.  It answers the question, "Did you get
at least one good backup for this policy, schedule, & client during the last
backup cycle".

It seems to work for me.  Good luck with it.

-M




------_=_NextPart_000_01C49075.3210C630
Content-Type: text/plain;
        name="plcheck.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
        filename="plcheck.txt"

#!/bin/ksh=0A=
PATH=3D$PATH:/usr/openv/local/bin:/usr/openv/netbackup/bin:/usr/openv/ne=
tbackup/bin/admincmd=0A=
=0A=
today=3D`gdate +%w"`=0A=
now=3D`gdate +%s`=0A=
midnight=3D`gdate --date=3D"today 00:00:00" +%s`=0A=
yesterday=3D`expr \( $today - 1 + 7 \) % 7`=0A=
TMP1=3D/tmp/`basename $0`.TMp1=0A=
=0A=
monconv() {=0A=
case $4 in=0A=
"Jan") val=3D"01";;=0A=
"Feb") val=3D"02";;=0A=
"Mar") val=3D"03";;=0A=
"Apr") val=3D"04";;=0A=
"May") val=3D"05";;=0A=
"Jun") val=3D"06";;=0A=
"Jul") val=3D"07";;=0A=
"Aug") val=3D"08";;=0A=
"Sep") val=3D"09";;=0A=
"Oct") val=3D"10";;=0A=
"Nov") val=3D"11";;=0A=
"Dec") val=3D"12";;=0A=
    *) val=3D"UNK";;=0A=
esac=0A=
day=3D$5=0A=
[ $day -lt 10 ] && day=3D"0${day}"=0A=
echo "$val/$day/$7 $6"=0A=
}=0A=
=0A=
## Create a temp file with denormalized policy data for active =
policies=0A=
echo "## Gathering policy data...\c"=0A=
for pl in `bppllist -allpolicies | awk '$1=3D=3D"CLASS" =
{pl=3D$2};$1=3D=3D"INFO" && $12=3D=3D0 {print pl}'`=0A=
do=0A=
  #policy, schedule, client, freq, mon_start, mon_dur, ... sat_start, =
sat_dur=0A=
  clientlist=3D`bppllist $pl | awk '$1=3D=3D"CLIENT" {print $2}'`=0A=
  for cl in $clientlist=0A=
  do=0A=
  #generate info for this policy, only for freq-based schedules that =
are=0A=
  #full, diff, or cumulative type.  Calendar-schedules are ignored.=0A=
  bppllist $pl | awk '$1=3D=3D"CLASS" {pl=3D$2;set=3D0}=0A=
                      $1=3D=3D"SCHED" && ($3<2||$3=3D=3D4) =
{sc=3D$2;freq=3D$5;set=3D1}=0A=
                      $1~/^SCHEDCAL/ {set=3D0}=0A=
                      $1=3D=3D"SCHEDWIN" && set=3D=3D1 && freq>0 {print =
pl,sc,"'$cl'",freq,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15;set=3D=
0}' =0A=
  done =0A=
done >$TMP1=0A=
echo "Done"=0A=
=0A=
## Start processing file=0A=
for pl in `awk '{print $1}' $TMP1 | sort -u`=0A=
do=0A=
  echo "\n## Processing: $pl"=0A=
  for client in `awk '$1=3D=3D"'$pl'" {print $3}' $TMP1 | sort -u`=0A=
  do=0A=
    sfound=3D0=0A=
    nsched=3D0=0A=
    for sched in `awk '$1=3D=3D"'$pl'" && $3=3D=3D"'$client'" && =
$4>0{print $2}' $TMP1`=0A=
    do=0A=
      freq=3D`awk '$1=3D=3D"'$pl'" && $2=3D=3D"'$sched'" && =
$3=3D=3D"'$client'" {print $4}' $TMP1`=0A=
=0A=
      found=3D0=0A=
      count=3D0=0A=
      #echo "$pl $client $sched: \c"=0A=
      while [ $count -le 7 ]=0A=
      do=0A=
        # Calculate an offsite from the current, going back 7 days =
until an=0A=
        # open window is found (duration is greater than 0)=0A=
        # if the current time is before the close of that found window =
then =0A=
        # keep searching back, currently open windows & windows that =
open in=0A=
        # the near future should be skipped. =0A=
        # Once the window is found, back-calculate from there using =
schedule =0A=
        # frequency to create a range in which a backup *should have* =
run.=0A=
        # Finally, search the image database to see if a backup did run =
in that=0A=
        # Range.  This script isn't detailed enough to see if a backup =
on an=0A=
        # individual stream failed - just a pass/fail for the policy, =
schedule,=0A=
        # and client.=0A=
 =0A=
        #offset from today=0A=
        of=3D`expr \( 14 + $today - $count \) % 7`=0A=
        #calc pseudo-array indexes for policy table=0A=
        os=3D`expr $of \* 2 + 5`=0A=
        od=3D`expr $of \* 2 + 6`=0A=
        #lookup start time & duration for that weekday=0A=
        ssec=3D`awk '$1=3D=3D"'$pl'" && $2=3D=3D"'$sched'" && =
$3=3D=3D"'$client'" {print $'$os'}' $TMP1`=0A=
        dsec=3D`awk '$1=3D=3D"'$pl'" && $2=3D=3D"'$sched'" && =
$3=3D=3D"'$client'" {print $'$od'}' $TMP1`=0A=
    =0A=
        #if window is open and not in the future or current time, then =
"found"=0A=
  [ $dsec -ne 0 -a $now -gt `expr $midnight - \( $count \* 86400 \) + =
$ssec + $dsec` ] && found=3D1=0A=
        [ $found -eq 1 ] && break=0A=
        count=3D`expr $count + 1` =0A=
      done=0A=
      #echo ""=0A=
      if [ $found -eq 1 ]=0A=
      then=0A=
        #calc start window=0A=
        st=3D`expr $midnight - \( $count \* 86400 \) + $ssec + $dsec - =
$freq + 1`=0A=
        #convert to mm/dd/yyyy hh:mm:ss=0A=
        sd1=3D`bpdbm -ctime $st`=0A=
        sd=3D`monconv $sd1`=0A=
        #count number of images from start time range to now=0A=
        count=3D`bpimagelist -idonly -client $client -policy $pl -sl =
$sched -d $sd 2>/dev/null | wc -l | tr -d ' '`=0A=
        if [ $count -eq 0 ]=0A=
        then=0A=
          #output if no images=0A=
          echo "  cl=3D$client, sched=3D$sched\thas nothing since =
\"$sd\"."=0A=
        fi=0A=
      fi=0A=
    done=0A=
  done=0A=
done =0A=
rm $TMP1=0A=
exit=0A=
 =0A=

------_=_NextPart_000_01C49075.3210C630--

<Prev in Thread] Current Thread [Next in Thread>
  • [Veritas-bu] Checking to see if a backup ran when expected., Mark.Donaldson AT cexp DOT com <=