Networker

Re: [Networker] Highly available NetWorker Solaris server, etc.

2006-09-20 11:41:36
Subject: Re: [Networker] Highly available NetWorker Solaris server, etc.
From: Stuart Whitby <swhitby AT DATAPROTECTORS.CO DOT UK>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 20 Sep 2006 16:32:12 +0100
You can scale this as far as your network allows.  There's a limit of 256 
supported devices in NetWorker up to 7.2, and 512 (I think) in 7.3.  
Scalability problems with NetWorker generally start when: 
- your storage nodes get too busy to respond in a timely manner to nsrmon 
requests to know if the mmd is still available 
- when the volume/drive selection process starts getting too complex for nsrd 
to complete quickly (I believe this has better logic in 7.3 and is offloaded 
from nsrd - but then you have to run 7.3....)
- processing power on the server is taken from nsr processes to service I/O 
instead.  The movement of data within the server takes up significant system 
resources and the faster the drives you add, the more load you put on the 
server.  Shifting this out to a storage node gives the NetWorker processes much 
more headroom, though you'll still need one mmd locally (even if that's disk 
backup cloned later to tape) for indexes and bootstrap backups.
 
The NetWorker server can be run in a cluster, but a failover will abort any 
running groups.  Even if the groups are restarted, any savesets will have to 
start from the beginning rather than continue from where they left off.  
There's also the potential problem of jukebox consistency after pulling the rug 
from under the server and savepnpc postscripts not running or the prescript 
being run twice.
 
So while it's possible to run NetWorker in a cluster, the best reason for this 
approach is for controlled testing of patches in the production environment, 
where one member can be active on a new patch and the other can remain as a 
failback in case of any problems.  
 
It's not possible to run NetWorker in an active-active cluster with 2 NetWorker 
servers.  Licensing won't allow it, there's no way (unless some strange use of 
chroot) to specify an alternate /nsr directory, and rpc is going to register 
the ports system-wide and not allow duplication.  What you can do is to run the 
NetWorker server on the active node and a storage node on the passive.  If and 
when the server fails, an nsradmin script can be created to run against the 
downed nsrdb to modify jukebox resources etc. to eliminate the drives connected 
to the downed server.  Otherwise, a cold standby is an easily workable solution 
which can continue using the original jukebox without much hassle or 
reconfiguration.  Simpler, but leaves you with hardware kicking around which is 
doing absolutely nothing - I'd recommend the scripted "storage node reconfig" 
if you have a scripting guru who can help with this.
 
Cheers,
 
Stuart.

________________________________

From: EMC NetWorker discussion on behalf of John Hope-Bailie
Sent: Wed 20-Sep-06 15:25
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Highly available NetWorker Solaris server, etc.



Hi,

We have standardised on Sun/Solaris NW Servers and Storage Nodes, and
different types of clients (Windows, Solaris, AIX and Linux).

Currently, the NW server(s) control the process but also do a
significant amount of backup (i.e. move plenty of data between clients
and devices).

We scale out by adding new Solaris NW Storage Nodes.

Current concerns are :-

1)  How far can we scale out before running into a limit ?

2)  How can we enhance the resilience of this approach ?

My thinking would be to stop using the NW server(s) as Storage Nodes.
We would use them as backup servers only (no backup data traffic would
pass through them). Hopefully this would free up sufficient performance
to allow virtually infinite scalability by adding additional external
Storage Nodes as necessary.

Obviously all the clients can be configured with primary and secondary
storage nodes, so resilience at this level is catered for.

However the NW server(s) become a single point of failure.  The thinking
here would be to use SunCluster on the NW Server(s) to provide higher
availability.

It would be useful if active-active clustering was possible.  i.e. NW
Server 1 running on Physical Node A and  NW Server 2 running on Physical
Node B.  If Node A fails, both NW Servers run on Node B.  I am not sure
if this can be done (certainly not on Windows clustering).

Other approaches such as a cold standby backup server can be considered,
but I do not really know what is possible with Solaris.

I would really appreciate any comments, criticisms, suggestions on the
above, in particular, from people who have tried or are using any of
these approaches.


John Hope-Bailie
E-mail:    johnhb AT channeldata.co DOT za



To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER



To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>