RE: [nv-l] Status Polling
2005-06-24 14:50:59
I agree with Bill. The timeouts and
retries are your best bet for tuning out false alarms. Depending on your
network, it may be the retries rather than the timeouts that work best
for you. Say 5 retries with a timeout of 2, if pings are getting lost.
Cordially,
Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
(248) 552-4968 Voicemail, Fax, Pager
"Evans, Bill"
<Bill.Evans AT hq.doe DOT gov>
Sent by: owner-nv-l AT lists.us.ibm DOT com
06/23/2005 09:11 PM
|
To
| "'nv-l AT lists.us.ibm DOT com'"
<nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| RE: [nv-l] Status Polling |
|
I’ve done it. Not hard
at all but expensive. Demand Poll takes a lot of cycles. This
script is executed out of the ESE.Automation when an event indicating a
failed poll is received. A ruleset kicks it off as a background action.
goshawk2#cat RouterDP.sh
#!/bin/ksh
Hostname=${1}
Date=`date`
echo ${Date} function off >>/opt/webmon/RouterDP.log
usr/OV/bin/nmdemandpoll ${Hostname}
>>/opt/webmon/RouterDP.log &
One problem is that SNMP doesn’t
really have any better priority or architectural power than ICMP. I
actually used the process when SNMP polling had a problem with late arriving
responses on a slow and overloaded processor. It’s an architectural
fact that ICMP and SNMP are low priority and allowed to be thrown away.
NetView compensates by its geometrically increasing waits on retries
and the ability to customize retries and wait time by device.
I quit using the script once
we had the problem figured out. The overhead of Demand Poll actually
made things a bit worse.
I’d go for solving the root
cause. Manipulate the timeouts and retries for ICMP. Make sure
your NetView box has enough resources. Check the delays at the routers
and switches to see if there’s a bad card tying up traffic. Etc.
The other alternative is to
look into the IBM Tivoli Switch Analyzer. It automates the follow-up
of failed polls and its slightly delayed follow up to the failed ICMP often
clears the condition.
Using an inline action is a
VERY BAD idea. Your entire rules processing waits for the demand
poll to finish. The system can totally bog down; note that my background
script spins the demand poll off as an independent process because it was
single threading the background action processing.
Bill Evans
-----Original Message-----
From: owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com]
On Behalf Of Kumar Vanka
Sent: Thursday, June 23, 2005 8:48 PM
To: nv-l AT lists.us.ibm DOT com
Subject: [nv-l] Status Polling
I'm using ICMP for status polling
in our environment. However, due to several factors, we're getting many
false positives. One of these factors is that ICMP has a low priority in
our environment. Is it possible to configure netmon so that if the ICMP
status poll shows that a node is down, it can then do a demand poll using
SNMP?
Based on my research, it appears
this is not possible. So, I'm considering modifying my ruleset to use
an inline action to run nmdemandpoll. Is this a good option? Or, are there
other options that I'm not considering?
Thanks.
- Kumar Vanka
ESM Architect
Invenio, Inc.
|
|
|