Networker

Re: [Networker] Highly available NetWorker Solaris server, etc.

2006-09-20 20:59:18
Subject: Re: [Networker] Highly available NetWorker Solaris server, etc.
From: Siobhán Ellis <siobhanellis AT HOTMAIL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 21 Sep 2006 10:48:44 +1000
The best solution for clustering the NetWorker server is to use Autostart
(or whatever it is called now). The reason is that restarting NetWorker is
very complex for all the reasons, and more, given by Stuart.

You may remember that Legato had a short lived module for NetWorker and
autostart and it was pulled. As the product manager I was looking at
relaunching it, but the first version was going to not automatically restart
it. It would;ve automated the process of restart, but would have to be
manually initiated. The reason? Just the complexity. You could so easily get
into a flip-flopping of the NetWorker services between members of the
cluster.

So, if you do Cluster the NW server (And I would do it) do it step by step.
Don't go all out at the beginning. Try and think of every possibility. Oh,
and you need to make sure that your jukebox, if controlled by the NetWorker
server, has the same SCSI address on all cluster nodes.

Siobhán Ellis
EMC Certified NetWorker Specialist
IDATA Integrity Pty Ltd
Sydney, Australia


On 21/9/06 1:32 AM, "Stuart Whitby" <swhitby AT DATAPROTECTORS.CO DOT UK> wrote:

> You can scale this as far as your network allows.  There's a limit of 256
> supported devices in NetWorker up to 7.2, and 512 (I think) in 7.3.
> Scalability problems with NetWorker generally start when:
> - your storage nodes get too busy to respond in a timely manner to nsrmon
> requests to know if the mmd is still available
> - when the volume/drive selection process starts getting too complex for nsrd
> to complete quickly (I believe this has better logic in 7.3 and is offloaded
> from nsrd - but then you have to run 7.3....)
> - processing power on the server is taken from nsr processes to service I/O
> instead.  The movement of data within the server takes up significant system
> resources and the faster the drives you add, the more load you put on the
> server.  Shifting this out to a storage node gives the NetWorker processes
> much more headroom, though you'll still need one mmd locally (even if that's
> disk backup cloned later to tape) for indexes and bootstrap backups.
>  
> The NetWorker server can be run in a cluster, but a failover will abort any
> running groups.  Even if the groups are restarted, any savesets will have to
> start from the beginning rather than continue from where they left off.
> There's also the potential problem of jukebox consistency after pulling the
> rug from under the server and savepnpc postscripts not running or the
> prescript being run twice.
>  
> So while it's possible to run NetWorker in a cluster, the best reason for this
> approach is for controlled testing of patches in the production environment,
> where one member can be active on a new patch and the other can remain as a
> failback in case of any problems.
>  
> It's not possible to run NetWorker in an active-active cluster with 2
> NetWorker servers.  Licensing won't allow it, there's no way (unless some
> strange use of chroot) to specify an alternate /nsr directory, and rpc is
> going to register the ports system-wide and not allow duplication.  What you
> can do is to run the NetWorker server on the active node and a storage node on
> the passive.  If and when the server fails, an nsradmin script can be created
> to run against the downed nsrdb to modify jukebox resources etc. to eliminate
> the drives connected to the downed server.  Otherwise, a cold standby is an
> easily workable solution which can continue using the original jukebox without
> much hassle or reconfiguration.  Simpler, but leaves you with hardware kicking
> around which is doing absolutely nothing - I'd recommend the scripted "storage
> node reconfig" if you have a scripting guru who can help with this.
>  
> Cheers,
>  
> Stuart.
> 
> ________________________________
> 
> From: EMC NetWorker discussion on behalf of John Hope-Bailie
> Sent: Wed 20-Sep-06 15:25
> To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
> Subject: [Networker] Highly available NetWorker Solaris server, etc.
> 
> 
> 
> Hi,
> 
> We have standardised on Sun/Solaris NW Servers and Storage Nodes, and
> different types of clients (Windows, Solaris, AIX and Linux).
> 
> Currently, the NW server(s) control the process but also do a
> significant amount of backup (i.e. move plenty of data between clients
> and devices).
> 
> We scale out by adding new Solaris NW Storage Nodes.
> 
> Current concerns are :-
> 
> 1)  How far can we scale out before running into a limit ?
> 
> 2)  How can we enhance the resilience of this approach ?
> 
> My thinking would be to stop using the NW server(s) as Storage Nodes.
> We would use them as backup servers only (no backup data traffic would
> pass through them). Hopefully this would free up sufficient performance
> to allow virtually infinite scalability by adding additional external
> Storage Nodes as necessary.
> 
> Obviously all the clients can be configured with primary and secondary
> storage nodes, so resilience at this level is catered for.
> 
> However the NW server(s) become a single point of failure.  The thinking
> here would be to use SunCluster on the NW Server(s) to provide higher
> availability.
> 
> It would be useful if active-active clustering was possible.  i.e. NW
> Server 1 running on Physical Node A and  NW Server 2 running on Physical
> Node B.  If Node A fails, both NW Servers run on Node B.  I am not sure
> if this can be done (certainly not on Windows clustering).
> 
> Other approaches such as a cold standby backup server can be considered,
> but I do not really know what is possible with Solaris.
> 
> I would really appreciate any comments, criticisms, suggestions on the
> above, in particular, from people who have tried or are using any of
> these approaches.
> 
> 
> John Hope-Bailie
> E-mail:    johnhb AT channeldata.co DOT za
> 
> 
> 
> To sign off this list, send email to listserv AT listserv.temple DOT edu and 
> type
> "signoff networker" in the
> body of the email. Please write to networker-request AT listserv.temple DOT 
> edu if
> you have any problems
> wit this list. You can access the archives at
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
> 
> 
> 
> To sign off this list, send email to listserv AT listserv.temple DOT edu and 
> type
> "signoff networker" in the
> body of the email. Please write to networker-request AT listserv.temple DOT 
> edu if
> you have any problems
> wit this list. You can access the archives at
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
> 


Siobhán

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>