Networker

Re: [Networker] NW 7.5.1 Server on Win 2008 cluster - nsrd dies quickly and quietly

2009-08-06 11:20:32
Subject: Re: [Networker] NW 7.5.1 Server on Win 2008 cluster - nsrd dies quickly and quietly
From: John Hope-Bailie <johnhb AT DEMANDDATA.CO DOT ZA>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 6 Aug 2009 17:12:11 +0200
Many thanks to Matthew and Thierry for the good input.

It seems that the problem may have been more serious than the cluster poll 
response issue.  We still don't know the root cause. 

EMC have provided a fix with new some new binaries.  These seem to have 
resolved the issue at this time.  We will know better after some more testing.

Best regards,

John Hope-Bailie
E-mail:    johnhb AT demanddata.co DOT za

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On 
Behalf Of Thierry FAIDHERBE
Sent: 05 August 2009 03:47 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] NW 7.5.1 Server on Win 2008 cluster - nsrd dies 
quickly and quietly

I always used the following procedure from EMC to manually start
clustered nsrd out of MSCS (2000/2003) control :

Goal: How to start nsrd in debug mode in Microsoft Cluster Server
Goal: Need to debug NetWorker Server in a cluster
Fact: NetWorker for Windows/NT
Fact: MSCS

Symptom: NetWorker Server in a cluster will not start

Fix: 
1. Make sure that nsrd is not running on any of the cluster nodes.

2. Use the Cluster Administrator GUI to bring online the following 
resources in the NetWorker Group
- NetWorker 'Shared Disk' resource
- NetWorker 'IP Address' resource
- NetWorker 'Network Name' resource
Do not bring the NetWorker 'Server' resource on line.

3. Log in the cluster node, which is currently running the
NetWorker Group

4. Edit the "\nsr\bin\NetWorker.clustersvr" to say:
type: cluster;
shared dir: ":\\nsr";
active: Yes;

->> A good think is to copy it when group ressource is online.

5. Use a CMD window to run 'nsrd' or 'nsrd -D 5'  
'nsrd' will start networker without debug,
'nsrd -D 5' will start networker with debug.
Use <ctrl>-C to shutdown nsrd properly

If nsrd start correctly, you may have to check polling intervals
from MSCS cluster resources.

Kind regards - Bien cordialement - Vriendelijke groeten,

Thierry FAIDHERBE
Backup/Storage & System Management

LE FOREM - Administration Centrale
Département des Systèmes d'Information

Boulevard Tirou, 104  Tel: + 32 (0)71/206730
B-6000 CHARLEROI      Fax: + 32 (0)71/206199
BELGIUM               Mail : Thierry.faidherbe<at>forem.be



-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of John Hope-Bailie
Sent: mercredi 5 août 2009 10:49
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] NW 7.5.1 Server on Win 2008 cluster - nsrd dies
quickly and quietly

Hi Matt,

Thanks for the reply.  It could be a similar phenomenon, in that it
started after a large number of clients had been configured.  Maybe nsrd
is so busy checking the res DB on startup that it appears to have failed
so the cluster manager kills it.
We are new to Win 2008 cluster. Does anyone have any ideas of where you
would look to check this out ?
Do these cluster failover events get logged anywhere ?
Any ideas how the monitoring of NetWorker being alive is done in a
Windows 2008 cluster environment ?

Regards,

John Hope-Bailie
E-mail:    johnhb AT demanddata.co DOT za

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Mathew Harvest
Sent: 05 August 2009 12:00 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] NW 7.5.1 Server on Win 2008 cluster - nsrd dies
quickly and quietly

Hey John,

I have no idea if this will be of any use at all as our implementation
was in a Sun cluster, but it might give you a place to start looking ...
we had the same problem, I was importing client definitions, and after I
had added about 280 clients NetWorker would abort its start-up and
switch nodes, try and start and fail and then fault ... it ended up
being a couple of things, there is a script that starts and monitors
NetWorker to see if it is still running, it was using an unreliable
method to test if NetWorker was responding (esg83809 - LGTsc04903 this
is a bug for NetWorker for UNIX), the other thing was that this script
had a timeout that needed to be increased 

The one way that we tested this was to start NetWorker normally and not
under a cluster resource, and it started fine ...



Mat Harvest
Infrastructure Services 
Shared Information Solutions 
Tel: +61 7 3035 7213
Mobile: +61 412 402 047 
mathew.harvest AT communities.qld.gov DOT au 

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of John Hope-Bailie
Sent: Wednesday, 5 August 2009 1:32 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] NW 7.5.1 Server on Win 2008 cluster - nsrd dies
quickly and quietly

Hi,

 

I have a clustered NW 7.5.1 server running on a Win 2008 cluster.  The
client deployment is still underway but for the past month everything
has run well.

 

Suddenly the NW server died.  Closer examination shows that when the
nsrd service is started using cluster manager, it runs for a short
while, (say 20 seconds) and then dies.  It can be seen to be using  a
bit of memory while is starts up, but it issues no messages or entries
into the daemon log at all.

 

We have cleared out the temp folders, cleared away the existing daemon
logs, checked permissions in these folders all without success.

 

We have also tried to start nsrd in debug mode (not sure if we got this
right) but nothing was logged.

 

If we delete the contents of the res folder ( i.e. emulate a clean
install) the service starts up o.k.

 

But after doing mmrecovs from several points in the past, these versions
of the res DBs still result in nsrd being unable to start.

 

It appears that something is corrupted in the res DB, but if so,  it
must have been like this for a while.

 

If so, why did the nsrd not fail sooner.

 

We have a case open with EMC, but if anyone know how to fix this one
please shed some light.

 

Regards,

 

John Hope-Bailie



 


To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


"Queensland celebrates its 150th anniversary in 2009. Check out
what's on today at www.q150.qld.gov.au." 


********************************* DISCLAIMER
********************************* 

The information contained in the above e-mail message or messages
(which includes any attachments) is confidential and may be legally
privileged. It is intended only for the use of the person or entity
to which it is addressed. If you are not the addressee any form of
disclosure, copying, modification, distribution or any action taken
or omitted in reliance on the information is unauthorised. Opinions
contained in the message(s) do not necessarily reflect the opinions
of the Queensland Government and its authorities. If you received
this communication in error, please notify the sender immediately
and delete it from your computer system network. 

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type
"signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER