nv-l

RE: [nv-l] Status Polling

2005-06-24 14:50:59
Subject: RE: [nv-l] Status Polling
From: Leslie Clark <lclark AT us.ibm DOT com>
To: nv-l AT lists.us.ibm DOT com
Date: Fri, 24 Jun 2005 14:49:56 -0400

I agree with Bill. The timeouts and retries are your best bet for tuning out false alarms. Depending on your network, it may be the retries rather than the timeouts that work best for you. Say 5 retries with a timeout of 2, if pings are getting lost.

Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
(248) 552-4968 Voicemail, Fax, Pager



"Evans, Bill" <Bill.Evans AT hq.doe DOT gov>
Sent by: owner-nv-l AT lists.us.ibm DOT com

06/23/2005 09:11 PM
Please respond to
nv-l

To
"'nv-l AT lists.us.ibm DOT com'" <nv-l AT lists.us.ibm DOT com>
cc
Subject
RE: [nv-l] Status Polling





I’ve done it.  Not hard at all but expensive.  Demand Poll takes a lot of cycles.  This script is executed out of the ESE.Automation when an event indicating a failed poll is received.  A ruleset kicks it off as a background action.  
 
goshawk2#cat RouterDP.sh
#!/bin/ksh
Hostname=${1}
Date=`date`
echo ${Date} function off >>/opt/webmon/RouterDP.log
usr/OV/bin/nmdemandpoll ${Hostname} >>/opt/webmon/RouterDP.log &
 
One problem is that SNMP doesn’t really have any better priority or architectural power than ICMP.  I actually used the process when SNMP polling had a problem with late arriving responses on a slow and overloaded processor.  It’s an architectural fact that ICMP and SNMP are low priority and allowed to be thrown away.  NetView compensates by its geometrically increasing waits on retries and the ability to customize retries and wait time by device.  
 
I quit using the script once we had the problem figured out.  The overhead of Demand Poll actually made things a bit worse.  
 
I’d go for solving the root cause.  Manipulate the timeouts and retries for ICMP.  Make sure your NetView box has enough resources.  Check the delays at the routers and switches to see if there’s a bad card tying up traffic.  Etc.  
 
The other alternative is to look into the IBM Tivoli Switch Analyzer.  It automates the follow-up of failed polls and its slightly delayed follow up to the failed ICMP often clears the condition.  
 
Using an inline action is a VERY BAD idea.  Your entire rules processing waits for the demand poll to finish.  The system can totally bog down; note that my background script spins the demand poll off as an independent process because it was single threading the background action processing.    
 
Bill Evans
 
-----Original Message-----
From:
owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com] On Behalf Of Kumar Vanka
Sent:
Thursday, June 23, 2005 8:48 PM
To:
nv-l AT lists.us.ibm DOT com
Subject:
[nv-l] Status Polling

 
I'm using ICMP for status polling in our environment. However, due to several factors, we're getting many false positives. One of these factors is that ICMP has a low priority in our environment. Is it possible to configure netmon so that if the ICMP status poll shows that a node is down, it can then do a demand poll using SNMP?
 
Based on my research, it appears this is not possible. So, I'm considering modifying my ruleset to  use an inline action to run nmdemandpoll. Is this a good option? Or, are there other options that I'm not considering?
 
Thanks.
 
- Kumar Vanka
ESM Architect
Invenio, Inc.
 
<Prev in Thread] Current Thread [Next in Thread>