Networker

Re: [Networker] Advice needed on staging policy values

2009-09-09 09:37:43
Subject: Re: [Networker] Advice needed on staging policy values
From: Yaron Zabary <yaron AT ARISTO.TAU.AC DOT IL>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 9 Sep 2009 16:33:37 +0300
Francis Swasey wrote:
On 9/9/09 9:15 AM, Yaron Zabary wrote:

In order to solve this I now have a script that stages only
~15Gb and then sleeps for a minute. This frees space earlier and also
allows recoveries to run within a reasonable time frame. The script also
checks when it runs and has different target watermark for time of day
and day of week.

Yaron,
Are you willing to share your script? I was just about to start writing something to deal with these issues myself.


Here it goes (some of the functions are unused). It has been working for many months in my environment.

#!/usr/bin/perl

# Replace legato's staging policy.
# This script has the following advantages on Networker's staging policy:
# It knows about days of week and time and knows when not to try staging.
# It tries to stage 15Gb at a time which means that once this chunk is stages, # the space is available for backups. This also means that recoveries can run when
# the stage ends.

system ("/usr/bin/logger -t stage -p local5.debug staging2.pl started.");

# stage 15Gb at a time.
$stagechunk = 15000000000;

# The rate at which we write to tape (used for calculations).
$stagerate = 15000000;

# If this file exists, stop script.
$stopfile = "/usr/local/TAUSRC/Local/staging/stopstage";

# Day of week 0 is Sunday.

$giga = 1000000;
$targetp{0} = 1000*$giga;
$targetp{1} = 1000*$giga;
$targetp{2} = 1000*$giga;;
$targetp{3} = 2400*$giga;
$targetp{4} = 2400*$giga;
$targetp{5} = 3000*$giga;
$targetp{6} = 700*$giga;
$emergencyp{0} = 150*$giga;
$emergencyp{1} = 150*$giga;
$emergencyp{2} = 150*$giga;
$emergencyp{3} = 150*$giga;
$emergencyp{4} = 150*$giga;
$emergencyp{5} = 450*$giga;
$emergencyp{6} = 650*$giga;


$lastchime = hourofday() - 1;

while(1)
{

  if ( -e $stopfile )
  {
system ("/usr/bin/logger -t stage -p local5.debug staging2.pl exiting.");
     exit;
  }

  if ($lastchime != hourofday())
  {
    $lastchime = hourofday();
    $df = int(dfavail()/$giga);
    $ddmsg = "";
    $ddmsg = "Disk empty" unless $diskempty == 0;
system ("/usr/bin/logger -t stage -p local5.debug $df on disk".$ddmsg.".");
  }

  $stage = 0;
  $estage = 0;

# If after 6:00, before 22:00 and we have less space.
  $hour = hourofday();
  if (($targetp{dayofweek()} > dfavail()) && ($hour > 3) && ($hour < 21))
  {
    $stage = 1;
  }

# Emergency if drops below.
  if ($emergencyp{dayofweek()} > dfavail())
  {
    system ("/usr/bin/logger -t stage -p local5.debug Emergency staging.");
    $stage = 1;
    $estage = 1;

  }

#  printf " staging %d/%4.1f\n",$nssid,$ssidsize/1024/1024/1024;
#  print "\n";
#  print "$stage\n";

  if ($stage == 1)
  {

# find the ssid which need to be staged.

open(MMINFO,"mminfo -ot -xc, -q volume=DBODefault.009.RO -r ssid,cloneid,totalsize |");
    unlink "/tmp/nsrstage2";
    open(SSID,">/tmp/nsrstage2");
    $diskempty = 0;
    $nssid = 0;
    $ssidsize = 0;
    $header = <MMINFO>;
    #while ( <MMINFO> && ($ssidsize <= $stagechunk) && ($nssid < 1))
    while ( ($line = <MMINFO>) && ($ssidsize <= $stagechunk))
    {
      ($ssid,$cloneid,$totalsize) = split(/,/,$line);
      chop($totalsize);
      $ssidsize += $totalsize;
      print SSID "$ssid"."/"."$cloneid\n";
      $nssid++;
      #system ("/usr/bin/logger -t stage -p local5.debug ssid is $ssid.");
    }

    close(MMINFO);
    close(SSID);

    if ($nssid == 0)
    {
       sleep 60;
       next;
       $diskempty = 1;
    }
    $df = int(dfavail()/$giga);
    $endtime = $ssidsize/$stagerate + $nssid*3 + time() ;
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($endtime);
    $endmin = $min;
    $endhour = $hour;
#  Staging should end before 22:00 unless it is emergency.
    $endhourx = $ssidsize/$stagerate/3600+hourofday();
if (($endhourx <= 22) || ($estage == 1) || (dayofweek() == 4) || (dayofweek() == 5))
    {
system ("/usr/bin/logger -t stage -p local5.debug staging $nssid size=$ssidsize df=$df end=$endhour:$endmin");
      #system("cat /tmp/nsrstage2");
      system("nsrstage -m -d -b TAUDefault -S -f /tmp/nsrstage2");
      system ("/usr/bin/logger -t stage -p local5.debug staging done.");
    }
  }

# Let recover and clone jobs get access to the RO disk device.
  sleep 60;
}

sub dfavail
{
# This function returns the available space of the AFTD
  open(DF,"df -k /AX150/DBOZ |");
  while(<DF>)
  {
    chop;
    if(/DBOZ/)
    {
      ($f1,$f2,$f3,$avail,$f4) = split(/ +/);
    }
  }
  close(DF);
  return $avail;
}

sub grabserverinfo
{
  unlink "/tmp/nsradmin.stage.in";
  open(OUT,">/tmp/nsradmin.stage.in");
  print OUT "option hidden:on\nprint type:NSR\n";
  close(OUT);
  open(NSRR,"nsradmin -i /tmp/nsradmin.stage.in |");
  @serverinfo =<NSRR>;
  close(NSRR);
}

sub saves
{
# This function returns the number of saves currently running
open(PS,"ps -ef | grep 'nsrexec ' | grep -v grep | wc -l | awk '{print $1}' |");
  $s = <PS>;
  chop $s;
  return $s;
}

sub diskrate
{
# This function returns the rate of disk write on the AFTD

}

sub dayofweek
{
# This function returns the day of week
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time());
  return $wday;
}

sub hourofday
{
# This function returns the hour of day
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time());
  return $hour;
}

sub fulltoday
{
# This function gives an estimate of the full backups today
}


--

-- Yaron.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>