Networker

[Networker] EDL, cloning and storage nodes

2008-12-03 23:00:10
Subject: [Networker] EDL, cloning and storage nodes
From: Brendan Sandes <brendannetworker AT GMAIL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 4 Dec 2008 13:58:19 +1000
Hi All.

Is there a way to set the recover and save portions of a clone operation
such that it will mount the read tape on a different storage node to the SN
where the backup occurred and the save portion of a clone on yet another
storage node again?

Firstly, according to  EMC, the behaviour of the -J switch in nsrclone is
different to what the man page says.  The man page states that it affects
the read portion of a clone only.  This is apparently a documentation bug
and it affects both the recover AND the save portion of the clone operation.

So you all have an idea of what we're trying to achieve, here is an
environment overview.
- NetWorker 7.4 SP3 on server, All Storage nodes and all clients.  server
and storage nodes are running on solaris 10.
- there are several sites.
- Each group of servers is in it's own VLAN which is firewalled off from
other VLANS.
- the required NetWorker ports are open between clients, storage nodes and
server (all backups are working fine).
- Backups at each site go to the local backup storage node which is fiber
connected to a local EDL (EMC virtual tape library).
- Normal recoveries at each site will use the storage node where the data
were backed up
- cloning is done by script.

site1 backups -> site1_SN1 --->site1_EDL1
site 2 backups -> site2_SN1 ---> site2_EDL
site 3 backups -> site3_SN1 ---> site2_EDL

The DR design is that all backups are automatically cloned to a storage node
at site 4, the DR site.

We want to offload the cloning traffic at each site to a different server
(SN2) at each site for two reasons
1.  load.  Cloning can cause a reasonable amount of overhead on a storage
node.  If it is a separate server, then we can start cloning processes
relatively soon after the first backup finishes.  This also increases the
amount of time we have to clone data.
2.  Firewalls.  As mentioned before, the environment is firewalled.  If we
use the same storage node as the backups, then we will at most get 1Gb/s
bandwidth for cloning (even if we trunk ports, we will still eventually be
limited by the ability of the firewall to process packets).  If we have a
separate storage node at each site, then we can isolate the cloning SN into
a seperate VLAN that spans sites and doesn't have to be firewalled.

Setting the destination (write portion) for the clones is easy as I use the
clone storage node attribute of client entry of the SN where backups
occurred.

Setting the read portion is a little more difficult.
- I cannot use the -J switch for the reason mentioned above (it sets both
source AND destination).
- because the backups are written to EDL, the behaviour is always as if the
variable FORCE_REC_AFFINITY is set to yes (see man page for nsr_client)
- I cannot simply mount the source of the clone in the clone storage node as
it follows the recover storage node affinity.
- I cannot set recover storage node affinity to the clone storage node as
this ALSO applies to normal recovers and the operations staff who normally
do this function don't have the ability to configure NetWorker (change the
recover storage node affinity).
- Each site has a different clone storage node.  Each site's backups are not
accessible to other sites storage nodes (I.e. there is no fibre connectivity
between sites).

The only way I've found to do this so far is to have the recover storage
node affinity as follows for each client
backup_sn at each site
clone storage node for the relevant site
AND
set the read host name attribute of the jukebox.

There are a couple of problems with this solution though.
1.  This would have to be set at the start of the clone script and unset at
the end as this value also applies to normal recovers.  The firewall is
setup such that only networker traffic from the server is allowed to the
clone storage nodes through the firewall i.e. no networker traffic is
allowed to/from the clients themselves.  Therefore normal recovers wouldn't
work.
2.  If someone needs an urgent recover while the clone process is running,
then operations will not be able to do it as they do not have privileges to
change NetWorker configurations.

So.  Is there an easy way to do what I need to do?  Have I missed something
really basic?

Cheers!
Brendan

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER