Networker

[Networker] Staging with a Float

2008-02-07 09:39:03
Subject: [Networker] Staging with a Float
From: Ian G Batten <ian.batten AT UK.FUJITSU DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 7 Feb 2008 14:33:12 +0000
One thing we're keen to do is to hold a few days of on-line backups (for rapid restoration) but to take them to tape periodically in case the disk subsystem dies or we need the savesets months later. This need has grown with having a decent-sized pool of disk twenty miles away with a GigE line: we want to back up to the remote site, but keep local tapes for long-term archive.

We are also in the unusual position that we don't have a tape device on the networker server, so the bootstraps are going to a dedicated adv_file device on the networker server and then being taken to tape by cloning or staging (as the mood takes us). Our ideal solution is to have the bootstraps on on-site disk, off-site disk and tape within an hour or so.

Cloning would sort-of work, except we would still have to manage the size of the disk collection by expiring one of the two clones of the saveset. And as the tapes are only really an absolute last line of defence (we have the live copy on-site and the disk copy off-site) we would rather run the cloning to tape during the day, when we can fix anything that goes wronmg with the robots, rather than at night, when the link is saturated with replication jobs and the robot is left to its own devices.

So far as we can see, automatic cloning always takes place as the job finishes. You can't say ``when finished, queue a clone job, and run all the clones in this pool on this trigger''.

Staging, unfortunately, will always delete savesets that have been staged. In an ideal would we could have a staging policy in which the trigger on saveset age was independent on the trigger on capacity, so we could say ``stage savesets within X hours of writing, and delete savesets that have been staged if the disk is more than X% full''. At the moment if you set, say, maximum retention time to one hour in a staging policy, it will stage the savesets within an hour but will delete them from the source even if the volume is 99% empty.

Does anyone have any clean workarounds for this? Or reasons why what I want to do is stupid?

I've written the following shell script, which runs out of cron every few hours (and desperately needs locking against multiple invocations: I'll code that this evening) but I can't say the need to do this fills me with joy, and it's full of all sorts of hacky workarounds. nsrmm -d -y in a shell script is bound to end in tears one day.


ian

#!/bin/ksh

typeset -r tmp=/tmp/$(basename $0).$$
trap "rm -f $tmp" 0
typeset -r server=backup-srv.ftel.co.uk
typeset -r window=14days
typeset -r retention=7days


# bootstraps from offsite disk to onsite disk and onsite tape
# everything else from offsite disk to onsite tape

for pool in BootstrapStaging DatabasesStaging IncrementalsStaging; do
  case $pool in
     DatabasesStaging) set -A target DatabasesClone ;;
BootstrapStaging) set -A target BootstrapOnlineClones IndexClones ;;
     IncrementalsStaging) set -A target Incrementals ;;
     *) echo $0: $pool is an unknown pool 1>&2 ; exit 1 ;;
  esac

# scan volumes that are in use recently
  for volume in $(mminfo -s $server \
                  -q "pool=${pool},volaccess>-${window}" \
                  -r volume 2>/dev/null); do
     # skip over the non-.RO shadows (readonly on mminfo doesn't work)
     case $volume in
       *.RO) echo $0: scanning $volume 1>&2 ;;
       *) echo $0: skipping $volume 1>&2; continue ;;
     esac

     # find the savesets for which we have no clones
     # if we have previously tried to clone to two pools, but only one
     # succeeded, this WILL NOT re-clone to the missing pool.
# 3 means one on the staging set, one on the .RO shadow, one on tape
     mminfo -s $server  -q "volume=$volume,copies<3,!incomplete" \
            -r ssid 2> /dev/null |
       sort -u > $tmp

     # if we found anything then clone to the selected pool(s)
     if [[ -s $tmp ]]; then
       for t in ${target[*]}; do
echo $0: saving $(wc -l < $tmp) savesets from $volume to $t... 1>&2
         if [[ $1 = live ]]; then
           nsrclone -b $t -s $server -S -f $tmp
         fi
       done
     else
       echo $0: no cloning work to do for $volume 1>&2
     fi

     # find the staging copies of savesets for which we now have
     # other copies
mminfo -s $server -q "volume=${volume},copies>=3,savetime<-$ {retention}" \
         -r ssid,cloneid 2> /dev/null | grep -v ssid > $tmp
     # and delete them if required
     if [[ -s $tmp ]]; then
       while read ssid cloneid; do
          echo $0: can delete $ssid/$cloneid 1>&2
          if [[ $1 = live ]]; then
             nsrmm -s $server -d -y -S $ssid/$cloneid
          fi
       done < $tmp
       # tidy up the volumes if we deleted anything
       echo $0: cleaning up $volume 1>&2
       if [[ $1 = live ]]; then
         nsrstage -v -s $server -C -V $volume
       fi
     else
       echo $0: nothing to delete from $volume 1>&2
     fi
   done
done

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>
  • [Networker] Staging with a Float, Ian G Batten <=