I already have a PMR opened on this, but was wondering if I can get a better
grasp on what is occurring.
Early this week we had a major router outage and noticed that our NV 5.0
platform didn't notice that the
router had gone down. I had seen cases earlier where I couldn't demand poll
objects without restarting
netmon, but since I cam from an NNM environment that didn't seem to unusual
to me. I tried to turn on
logging with the netmon -M 7 command, but that didn't work. I had to edit
the ovsuf file and restart
netmon to get it to take. (Yes I know I shouldn't manually edit ovsuf, but
WAaaa I have been doing it for
years and I fix the lrf files after the fact.)
There is a lot of trouble shooting between the beginning and this...changing
the port for trapd, moving it back,
rebooting the box, backing up the map and deleting it, taking out the seed
file, disconnecting the MLMs, etc.
There was a time period where trapd.log and the netmon.trace would hang at
the same time (apparently the
daemons were also hung).
At this point in time we are in the situation that trapd is running and
netmon continuously pings devices (though it
finishes a polling run before starting the next on.) That sounds great other
than in under 30 minutes nmdemandpoll
fails to run and events are not displayed in the control panel. Guess what
fixes it temporarily..... moving
/etc/resolv.conf to another file name, leaving it there for a few seconds
and moving it back. (The poll actually
starts running while it resolv.conf doesn't exist.) This will work for
about 30 minutes and then nmdemandpoll hangs
again. While nmdemandpoll is hung I can do host and nslookup commands. I
also cannot demand poll things that are
in by IP Address and not name. Ping, and MIB Browser also work while I can
not demand poll.
Today we removed /etc/resolv.conf and changed the netsvc to just say host.
It appears to be working.
We will have to move back onto DNS before we start passing events to TEC.
Any ideas on how to troubleshoot our
NetView/DNS issue? Is there a way to trace what it is hanging on?
jcbrown AT jcbrown DOT net