Networker

Re: [Networker] storage node problem - invalid connection from 0.0.0.0/19315 to 0.0.0.0/0

2007-03-14 11:42:41
Subject: Re: [Networker] storage node problem - invalid connection from 0.0.0.0/19315 to 0.0.0.0/0
From: Support User <support AT DIGIDYNE DOT CA>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 14 Mar 2007 11:33:34 -0400
Good day Mark,

To me this looks like an issue related to a job or operation running
everyday outside of Networker like on your DNS server which like you
probably know, Networker is very dependent on for communicating. 

Regards,

--

Yohann Darsigny                       420, Armand-Frappier, Suite 320
Technical Consultant                  Laval, Quebec
Professional Services                 H7V 4B4
support AT digidyne DOT ca                   Tel.: 1-800-668-4525
http://www.digidyne.ca                Fax:  (450) 686-1757  

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Mark Davis
Sent: Wednesday, March 14, 2007 11:17 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] storage node problem - invalid connection from
0.0.0.0/19315 to 0.0.0.0/0

I've been having problems with my crontab initiated staging script for
several months, which is only affecting jobs on my storage node.

- Networker version 7.2.1 server and storage node
- Solaris 10

We run our staging script from cron at 7:00am each day (note that this
script has worked unchanged for over a year). Staging of various pools
is started with 5 minute intervals between the start of each job. For
the last few months, our staging has been failing at just after 7:05am.

In digging through the daemon.log on our storage node, I see that there
are a series of error messages starting just when the staging failure
occurs -

nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/19315 to 0.0.0.0/0.
Aborting due to: Connection timed out

Included below is a piece of the daemon.log for a typical failure. The
first staging job starts at 7:00am, and the next at 7:05am. Then come
the errors and the staging fails. I have found a workaround for this,
which is simply to start my staging script at 7:20am, and the jobs run
fine.

Note that even with my staging starting at 7:20am, the connection errors
are still happening each day starting at 7:05 and ending at around
7:10am. The errors are *not* tied to the start time of my staging.

I have been trying to get an answer from EMC, with no luck so far. I
find it quite strange that the IP shown in the connection is 0.0.0.0. 
Has anyone see this problem? Any suggestions as to what might be causing
this?

Note: xxxxx = storage node

03/02/07 03:30:06 nsrmmd #71: Start nsrmmd #71, with PID 12003, at HOST
xxxxx
03/02/07 04:15:11 nsrmmd #71: Start nsrmmd #71, with PID 14073, at HOST
xxxxx
03/02/07 07:00:06 nsrmmd #71: Start nsrmmd #71, with PID 21754, at HOST
xxxxx
03/02/07 07:00:07 nsrmmd #72: Start nsrmmd #72, with PID 21755, at HOST
xxxxx
03/02/07 07:05:11 nsrmmd #73: Start nsrmmd #73, with PID 21991, at HOST
xxxxx
03/02/07 07:05:11 nsrmmd #74: Start nsrmmd #74, with PID 21992, at HOST
xxxxx
03/02/07 07:05:13 nsrmmd #75: Start nsrmmd #75, with PID 21994, at HOST
xxxxx
03/02/07 07:05:13 nsrmmd #76: Start nsrmmd #76, with PID 21995, at HOST
xxxxx
nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/19315 to 0.0.0.0/0.
Aborting due to: Connection timed out
nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/25384 to 0.0.0.0/0.
Aborting due to: Connection timed out
nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/20478 to 0.0.0.0/0.
Aborting due to: Connection timed out
nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/14408 to 0.0.0.0/0.
Aborting due to: Connection timed out
nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/13171 to 0.0.0.0/0.
Aborting due to: Connection timed out
nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/17692 to 0.0.0.0/0.
Aborting due to: Connection timed out
nsrexecd: Aborting connection.
invalid connection from 0.0.0.0/14981 to 0.0.0.0/0.
Aborting due to: Connection timed out
03/02/07 07:08:30 nsrmmd #36: Aborting connection.
invalid connection from 0.0.0.0/27962 to 0.0.0.0/0.
Aborting due to: Connection timed out
03/02/07 07:08:31 nsrmmd #45: Aborting connection.
invalid connection from 0.0.0.0/19838 to 0.0.0.0/0.
Aborting due to: Connection timed out
03/02/07 07:08:31 nsrmmd #45: MM_CLONEEND w/active saves
03/02/07 07:08:32 nsrmmd #45: Diagnostic: ONE_ICHUNK no shared memory
region for ssid 920448680
03/02/07 07:08:32 nsrmmd #45: Diagnostic: ONE_ICHUNK no shared memory
region for ssid 920448680
03/02/07 07:08:32 nsrmmd #45: Diagnostic: ONE_ICHUNK no shared memory
region for ssid 920448680


Thanks,

Mark
--
Mark Davis
Legato NetWorker Support - I.T.S
University of Western Ontario
email: davism AT uwo DOT ca

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or via RSS at
http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>