Re: False Interface downs after netview daemons down during weekly backu

Hi John.

First it does sound like an ARP issue but more information is needed.
Can you get a simple command line ping to work?
Does the Netview server have multiple interfaces and what are they?
What are the IPs of the devices you can't reach?
What are the masks of the devices?
Where is the problem the server or the host?
Check the obvious first.  Default routes and masks on each end and
routers in between.
It could be a DNS lookup issue.  Always an issue with multiple
interfaces if not done right.

Understand it could involve the server mask, RIP version (1 or 2),
default route or node mask or route or is it multi-homed.

The first thing the server will do when pinging a host, is resolve the
name, either by /etc/hosts or by DNS.
Second it looks at the IP dest. and looks in the routing table for the
longest match.

1.  If if finds it is on one of the servers interfaces (exact net match
or host route) it looks in the arp table for a match of the IP dest.
If a match occurs, it sends ping to that mac address.

If not, it arps on that interface for dest IP. Remember since the arp
request is broadcast anybody can respond and even multiple responses,
but the last arp response received will change it.

The host hears the arp request (it better) and should respond to the
source. It can also glean the source mac of the arp request so it
doesn't have to arp for it.

It will now send an arp response back to the source (In this case the
netview server).  This also involves masks and routing tables to
determine where to send it.  It may be going back out the wrong
interface or using default route.



2.  The other possibility it is the next hop is a router because the
exact match of net didn't occur, so the default is used.

There are lots of things to look at for a simple ping to work even more
than what I have mentioned here.

But basically both ends must know what mac to send the packet too, ping
or any other.

When it doesn't work understand what the path should be and gather all
the IP and mac info.  Is it right for both directions?

Hope this might help.

Jeff Fitzwater
CIT Systems & Networking
Princeton University
__________________

"Mull, John" wrote:
>
> Hi Netview Forum Readers-
> Looking for anyone that may have experienced the same type of problem after
> bringing down the Netview daemons for weekly
> maintenance (backup).
>
> Running Netview V5.1.1 on Solaris V2.6
> Tivoli Framework V3.6.1
>
> Each Sunday a script is run to kill any GUI user sessions still running and
> then an ovstop.  Then a Netbackup is scheduled of the Netview filesystem.
> After the backup is complete, an ovstart is executed.
>
> Immediately when netmon begins synchronization, I receive 35 interface downs
> from the same nodes each week.  These nodes are
> 3COM Hubs, switches and a few APC UPSs..  I am unable to ping these devices
> from the Netview GUI, and of course on the map they are in a critical red
> status.  From a command line telnet session, I am unable to ping these
> nodes.  From a WINDOWS 95 command line I am unable to ping or traceroute
> these nodes.  The networks to these switches are not congested, all of them
> have large WAN pipes.  Other components on their same subnets are fine.  All
> routers are fine.  Pretty sure no DNS problems.  We have tried to upgrade
> microcode on the 3COM hubs, this has not helped.  Another strange finding,
> all these interfaces come back 3 hours and 30 minutes after being reported
> down.  Every week..same timeframe..
>
> From a previous List Forum post sent to me by Leslie Clark (Thanks Leslie!)
> it appears other folks experienced a somewhat similar problem..
> This problems happens because when you've unplugged this port the
> entry in   the switch ARP-table and Netview ARP-table were refreshed
> (possible
> deleted). Netview wasn't able to translate IP to MAC and consequently wasn't
> able to find the machine.
>
> However our remote device are not getting unplugged.  I am trying to
> understand how this arp table on the remote component and
> Netview is related.  I have a pmr open with support, but they seem to be
> puzzled as much as I am.
>
> Any thoughts out there would  be appreciated?
>
> Thanks
>
> John Mull
> Information Technology & Integration
> Process Technologist, Enterprise Systems Management
> Hershey Foods Corp.
> (717)534-7959
> email:jmull AT hersheys DOT com
>
> Any comments or statements made are not
> necessarily those of Hershey Foods Corporation
Re: False Interface downs after netview daemons down during weekly backup