Networker

Re: [Networker] Advice needed on staging policy values

2009-09-09 09:21:13
Subject: Re: [Networker] Advice needed on staging policy values
From: Yaron Zabary <yaron AT ARISTO.TAU.AC DOT IL>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 9 Sep 2009 16:15:45 +0300
Len Philpot wrote:
We've just implemented 7.5.1 in our first production environment, which is magnitudes larger than the test environment before it. We have 11 roughly 500 GB disk volumes we're backing up to, knowing that at some point we will probably have to stage to tape (and the pools, tapes, policies, etc. are all in place for that). Right now, three of those volumes are about 20-25% used and that's it.

However, the *default* Max Storage Period on our staging policy is 7 days, which is waaaayyy too soon, causing our first test backups to already stage to tape. I know what the values all mean, but I've not yet spent a lot of time in thought as to what the best combination of max storage period, recover space interval and file system check interval is. But, blindly staging after only 7 days is clearly much too soon. We'd rather have a reasonable high water mark trigger staging than time.

Question: Given that we currently have the adv_file space available and don't need yet to stage, are there any hidden gotchas we'll run into if we push the max storage period to, say, a month, and just keep an eye on disk volume usage?

Any advice / tips / traps?

  Our environment has 6Tb of AFTD (on four devices). The problems I am
aware of are:

  . Setting static high and low watermarks is problematic. It means
that you will start staging during backup (because this is the only way
to go above the high water mark). This will cause your disks to do both
reads and writes at the same time which will cause everything to run
slower. When I first started using AFTD I had a script which was
changing the watermarks via cron, so as to avoid this problem:

30 15 * * * /usr/local/TAUSRC/Local/ToolBox/fixstage.pl 'TAUDefault
stage' 95 96
0 5 * * * /usr/local/TAUSRC/Local/ToolBox/fixstage.pl 'TAUDefault stage'
90 91
1 6 * * 3 /usr/local/TAUSRC/Local/ToolBox/fixstage.pl 'TAUDefault stage'
75 77
1 6 * * 4 /usr/local/TAUSRC/Local/ToolBox/fixstage.pl 'TAUDefault stage'
60 62
1 6 * * 5 /usr/local/TAUSRC/Local/ToolBox/fixstage.pl 'TAUDefault stage'
45 47
30 9 * * 6 /usr/local/TAUSRC/Local/ToolBox/fixstage.pl 'TAUDefault
stage' 80 82

# cat /usr/local/TAUSRC/Local/ToolBox/fixstage.pl
#!/bin/perl

open(OUT,">/tmp/nsradmin.in.fixstage");
print OUT ". name: $ARGV[0] ; \n\n";

print OUT "\nupdate high water mark (%): $ARGV[2] ; \n";
print OUT "low water mark (%): $ARGV[1] ; \n";

print OUT ("\nquit\n");
close(OUT);

system ("nsradmin -i /tmp/nsradmin.in.fixstage");

sleep 5;

  . While the above can work for some systems, it is not optimal
because staging calculates what it needs to do in a single run and you
end up with a huge and long staging process (I've seen a 1Tb staging
processes at times). This means that the system might be staging for
many hours. This causes two problems: You cannot recover while staging.
You don't reclaim the staged space until the entire job is done (and
then there is a hard coded sleep of about two seconds for each staged
save set). In order to solve this I now have a script that stages only
~15Gb and then sleeps for a minute. This frees space earlier and also
allows recoveries to run within a reasonable time frame. The script also
checks when it runs and has different target watermark for time of day
and day of week.

  . Besides the above, staging can be done to a single drive (because
the an AFTD device can only be accessed once), so if you need to stage
1Tb, you are limited to the bandwidth of your tape drive. While this
might be OK for some systems, it might be desirable to run concurrent
stage operations.

  . Another feature which might be useful is delayed staging, which
will let you copy savesets to tape now, keep them on disk and delete them later when the disk fills up. Although this can be scripted with nsrclone, it is a bit more complicated. Also the nsrclone requires a clone pool (and not a regular pool).


Thanks!
--
Len Philpot
   Cleco IT Network Services, PGO3 - ext 7167
(318) 484-7167
To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


--

-- Yaron.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER