Re: [nv-l] Cisco 3750's stackwised

2006-02-22 21:45:54
Subject: Re: [nv-l] Cisco 3750's stackwised
From: Gareth Holl <gholl AT us.ibm DOT com>
To: nv-l AT lists.us.ibm DOT com
Date: Wed, 22 Feb 2006 21:47:25 -0500

Brett, its not clear if netmon is configured to poll this device via icmp or snmp.....can you clarify please.

What version (including Fixpack & fixes) of NetView are you using and on what platform ? You may be experiencing a known problem with netmon and its use of community names.

If netmon is snmp polling, then it might not be using the correct community name. Even though you probably have the correct community name configured and snmpwalk may work, netmon might have a bad copy of the community name in its cache, or netmon might be incorrectly defaulting to "public". Demandpolls will behave the same as well. Are you able to run an iptrace, tcpdump, or snoop, etc to find out what community name it being sent out in netmon's poll.

If you have not configure netmon to poll this device via snmp per a $ entry in the netmon.seed file, then we can assume netmon is simply using icmp polls. Have you tried increasing the polling timeouts and adding additional retries ? Have you checked the netmon ping list (netmon -a 12, see output in netmon.trace) to see if there are negative numbers next the list of devices scheduled for polling ? Do you actually see the netmon ping going out and leave the box (again use iptrace/tcpdump/snoop etc). Have you tried "pinging" the device from within the NetView GUI (instead of at the command line) ?

Any chance there are duplicate IP addresses (which the current version of NetView does not handle) ? Are you seeing interfaces/nodes being deleted and re-added over and over.....check the trapd.log for patterns around the time you see the interface/node up and down events.

Are you using /etc/hosts or DNS or both. Check that forward and reverse lookups are accurate (including lookups against the short and fully qualified hostname) and ensure a /etc/hosts entry is not overriding a DNS entry you are expecting netmon to be using.

It might come down to turning on full netmon tracing (netmon -M -1,  /usr/OV/log/netmon.trace) to figure out what actions netmon is taking and what responses are logged. You could correlate the trace entries with the trapd.log and any iptracing you are able to do.

Hope this helps,

Gareth Holl
Advisory Software Engineer
Team Technical Lead

ITIL Foundations Certified
IBM Certified Deployment Professional
--Tivoli Data Warehouse v1.2
--Tivoli NetView 7.1.4

IBM Software Group - Tivoli Software
Research Triangle Park,  North Carolina, USA.

Brett <bgillmore AT gmail DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com

02/22/2006 05:53 PM
Please respond to

nv-l AT lists.us.ibm DOT com
Re: [nv-l] Cisco 3750's stackwised

Its a software loopback thats up all the time. Right now its polling every minute so it doesn't seem to be cisco issue. Ran a continues ping to the ip from a different box and it still droped the loopback down on the netview box. This problem is weird and I'm baffled.  
Can snmpwalk the device just fine with the provided community string also.
Thanks all for any suggestions
Any other ideas?

On 2/23/06, James Shanks <jshanks AT us.ibm DOT com> wrote:
Are you certain that this is not a Cisco issue?
You said that when you do the demandpoll, the loopback shows as down, but
comes back up after a ping.  So it's entirely possible from what you've
described that Cisco is dropping the loopback interface after some period
of inactivity, isn't it?  Or did I miss something in your description?

If they are dropping the loopback after some period of time, then  no
amount of tinkering with netmon is going to change that  fact,  unless, of
course, you poll so frequently that they don't get a chance to drop it.

James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group

           <[email protected]
           com>                                                       To
           Sent by:                  
nv-l AT lists.us.ibm DOT com
           [email protected]                                          cc
                                     Re: [nv-l] Cisco 3750's stackwised
           02/21/2006 08:44

           Please respond to
nv-l AT lists.us DOT ibm

Thanks for that. Just checked the Community strings and names and those
appear ok. Even tried changing the community string and still got the same
problem. Any other ideas?

On 2/22/06, Evans, Bill <
Bill.Evans AT hq.doe DOT gov> wrote:
I had a similar situation which resulted from case problems among the
various configurations including xnmsnmpconf and the hostname
resolution.  Somewhere after I added the node the case of the name
changed in the /etc/hosts file and messed things up nicely.  A0346K2
versus A0346k2 type of thing.

With the help of some very tolerent support folks and a lot of tracing
and debugging we eliminated everything else.  It all came down to a bad
community string resolution because of the names in different mixtures
of case.  Check it out.

Bill Evans

-----Original Message-----
owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com ]
On Behalf Of Brett
Sent: Tuesday, February 21, 2006 5:47 PM
nv-l AT lists.us.ibm DOT com
Subject: [nv-l] Cisco 3750's stackwised

Greetings all,

Just wondering of anyone else has had this issue. We have two set's of
3750's stackwised. Each set is configure with two switches setup with
the same loopback ip addr on each pair. Heres the issue:

We can snmp poll the device just fine, ping the loopback ok and the
interface goes green. However whenever netmon polls the device it drops
the loopback down when its not. From here you can demand poll the switch
and it shows the loopback down, but you can ping it back up, then do
another demand poll and it reports the interface as being up and looking
ok. It will show green until netmon does the normal timed poll then
drops it backdown again.

Anyone have any clues how to get this fixed or things I can try???

Sorry would call support but have to get the support contract reinstated
but that will take a few weeks to sort out within the company.

thanks all

<Prev in Thread] Current Thread [Next in Thread>