One thing we're keen to do is to hold a few days of on-line backups
(for rapid restoration) but to take them to tape periodically in case
the disk subsystem dies or we need the savesets months later. This
need has grown with having a decent-sized pool of disk twenty miles
away with a GigE line: we want to back up to the remote site, but
keep local tapes for long-term archive.
We are also in the unusual position that we don't have a tape device
on the networker server, so the bootstraps are going to a dedicated
adv_file device on the networker server and then being taken to tape
by cloning or staging (as the mood takes us). Our ideal solution is
to have the bootstraps on on-site disk, off-site disk and tape within
an hour or so.
Cloning would sort-of work, except we would still have to manage the
size of the disk collection by expiring one of the two clones of the
saveset. And as the tapes are only really an absolute last line of
defence (we have the live copy on-site and the disk copy off-site) we
would rather run the cloning to tape during the day, when we can fix
anything that goes wronmg with the robots, rather than at night, when
the link is saturated with replication jobs and the robot is left to
its own devices.
So far as we can see, automatic cloning always takes place as the job
finishes. You can't say ``when finished, queue a clone job, and run
all the clones in this pool on this trigger''.
Staging, unfortunately, will always delete savesets that have been
staged. In an ideal would we could have a staging policy in which
the trigger on saveset age was independent on the trigger on
capacity, so we could say ``stage savesets within X hours of writing,
and delete savesets that have been staged if the disk is more than X%
full''. At the moment if you set, say, maximum retention time to one
hour in a staging policy, it will stage the savesets within an hour
but will delete them from the source even if the volume is 99% empty.
Does anyone have any clean workarounds for this? Or reasons why
what I want to do is stupid?
I've written the following shell script, which runs out of cron every
few hours (and desperately needs locking against multiple
invocations: I'll code that this evening) but I can't say the need to
do this fills me with joy, and it's full of all sorts of hacky
workarounds. nsrmm -d -y in a shell script is bound to end in tears
one day.
ian
#!/bin/ksh
typeset -r tmp=/tmp/$(basename $0).$$
trap "rm -f $tmp" 0
typeset -r server=backup-srv.ftel.co.uk
typeset -r window=14days
typeset -r retention=7days
# bootstraps from offsite disk to onsite disk and onsite tape
# everything else from offsite disk to onsite tape
for pool in BootstrapStaging DatabasesStaging IncrementalsStaging; do
case $pool in
DatabasesStaging) set -A target DatabasesClone ;;
BootstrapStaging) set -A target BootstrapOnlineClones
IndexClones ;;
IncrementalsStaging) set -A target Incrementals ;;
*) echo $0: $pool is an unknown pool 1>&2 ; exit 1 ;;
esac
# scan volumes that are in use recently
for volume in $(mminfo -s $server \
-q "pool=${pool},volaccess>-${window}" \
-r volume 2>/dev/null); do
# skip over the non-.RO shadows (readonly on mminfo doesn't work)
case $volume in
*.RO) echo $0: scanning $volume 1>&2 ;;
*) echo $0: skipping $volume 1>&2; continue ;;
esac
# find the savesets for which we have no clones
# if we have previously tried to clone to two pools, but only one
# succeeded, this WILL NOT re-clone to the missing pool.
# 3 means one on the staging set, one on the .RO shadow, one on
tape
mminfo -s $server -q "volume=$volume,copies<3,!incomplete" \
-r ssid 2> /dev/null |
sort -u > $tmp
# if we found anything then clone to the selected pool(s)
if [[ -s $tmp ]]; then
for t in ${target[*]}; do
echo $0: saving $(wc -l < $tmp) savesets from $volume to
$t... 1>&2
if [[ $1 = live ]]; then
nsrclone -b $t -s $server -S -f $tmp
fi
done
else
echo $0: no cloning work to do for $volume 1>&2
fi
# find the staging copies of savesets for which we now have
# other copies
mminfo -s $server -q "volume=${volume},copies>=3,savetime<-$
{retention}" \
-r ssid,cloneid 2> /dev/null | grep -v ssid > $tmp
# and delete them if required
if [[ -s $tmp ]]; then
while read ssid cloneid; do
echo $0: can delete $ssid/$cloneid 1>&2
if [[ $1 = live ]]; then
nsrmm -s $server -d -y -S $ssid/$cloneid
fi
done < $tmp
# tidy up the volumes if we deleted anything
echo $0: cleaning up $volume 1>&2
if [[ $1 = live ]]; then
nsrstage -v -s $server -C -V $volume
fi
else
echo $0: nothing to delete from $volume 1>&2
fi
done
done
To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the body of the email. Please write to networker-request
AT listserv.temple DOT edu if you have any problems with this list. You can access the
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|