Networker

Re: [Networker] Networker storage node problem

2004-07-27 12:30:47
Subject: Re: [Networker] Networker storage node problem
From: Yura Pismerov <ypismerov AT TUCOWS DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Tue, 27 Jul 2004 12:29:48 -0400
We had the same problem not long time ago.
Our robot stuck (due to a hardware failure) in the middle of the tape request 
and it caused UDP packet storm between server and the
storage node until we killed the backup and all the processes that were 
associated with it on the server side.
And yes, the packet storm brought the network link (10 MB) between the server 
and the storage node down.
We have never escalated it to Legato though.
We run NetWorker 6.1.4.Build.562 Network Edition on Solaris 9 (server) and on 
Linux RH 7.3 (a storage node).


Adam Ardis wrote:
We've been having a problem since this weekend, as soon as I kick off a
backup from my server to a remote storage node, the network gets
hammered and unavailable.  It looks like millions of packets are sent
from the storage node server to the master server right before it goes
down.  The only thing I see in my daemon.log is wanting to mount a tape
at the storage node, and then I get the nsrmmd timeouts because the
network is down.  Stopping networker on the storage node(grendel) brings
the network back up.  When I restarted networker on the storage node,
the network immediately went back down, even though the save group was
no longer running.  It went to mount a tape in the pool at grendel, and
that was it.



One question I have is how does Networker send traffic across from
storage node to the master, and why would it be flooding the network if
I'm not doing the data backup across it?  At first I thought it was the
Index, but it hasn't gotten that far the last couple of times it freaked
out.  I've checked to see if the pool is set right, all tapes labeled
belong to the storage node.  This process has worked for many months up
until now, nothing changed on the legato side but the router config was
changed.  When it failed, the change was reverted and it isn't working
now.



07/26/04 23:30:27 nsrd: media info: suggest mounting NY0001L1 on grendel

 for writing  to pool 'NT Incremental NYC'

07/26/04 23:30:27 nsrd: media waiting event: Waiting for 1 writable
volumes to b

ackup pool 'NT Incremental NYC' tape(s) or disk(s) on grendel

07/26/04 23:30:28 nsrd: media info: suggest relabeling NY0003L1 on
grendel

 for writing  to pool 'NT Incremental NYC'

07/26/04 23:30:28 nsrd: media event cleared: Waiting for 1 writable
volumes to b

ackup pool 'NT Incremental NYC' tape(s) or disk(s) on grendel

07/26/04 23:30:28 nsrd: media waiting event: Waiting for 2 writable
volumes to b

ackup pool 'NT Incremental NYC' tape(s) or disk(s) on grendel

07/26/04 23:40:53 nsrd: media notice: check storage node: grendel (nsrmo

n timed out)

07/26/04 23:40:53 nsrd: media notice: check storage node: grendel (nsrmo

n timed out)

07/26/04 23:40:53 nsrd: media notice: check storage node: grendel (nsrmm

d missing from polling reply)



Any advice would be appreciated.



Thanks,

Adam




--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Yuri Pismerov, Sr. System Administrator,
TUCOWS.COM INC. (416) 535-0123  ext. 1352

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>