Re: [nv-l] Netview Rule - corrected
2004-10-22 03:44:02
Hi James,
Awesome, this is the first time I think
I understand this nvcorrd behaviour, hopefully that stays like this, and
thanks for the phrase that I am still thinking like a human being - after
all these long hours with these stupid machines :-)
In regards to the rule, I have extended
your suggestion and added a 6 minutes reset on match, which gives any nodes
time to come up again without sending any events to TEC.
I oultined this in a small slide and
posted it on the Netview Web page. Any comments for improvements are welcome.
But what I can see from my lab testing, it works like expected.
Link:
http://www.nv-l.org/twiki/bin/view/Netview/NetViewRule
Kind regards and thanks again to James,
Wolfgang
------------------------------------------------------------------
Wolfgang Bergbauer
Network and System Management Consultant
Cell phone: +49 172 534 9131
E-mail: wolfgang.bergbauer AT attglobal DOT net
James Shanks <jshanks AT us.ibm DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
10/21/2004 12:06 AM
Please respond to
nv-l AT lists.us.ibm DOT com |
|
To
| nv-l AT lists.us.ibm DOT com
|
cc
|
|
Subject
| Re: [nv-l] Netview Rule -
corrected |
|
When you have a ruleset problem, the thing to do is to
turn on the nvcorrd trace and see why what you expected to happen is not
happening. You do that with "nvcdebug -d all" and the results
are written to the nvcorrd.alog and blog. When a trap is received, you'll
see the eye-catcher "Received a trap" and when nvcorrd is finished
with it, you'll see "Finished with the trap", Everything that
happens in between is the processing nvcorrd did on it.
That said, I think I already know what you are going to find. You are looking
at the ruleset as a human being would, as a purely logical construct, not
as a computer algorithm which executes in real time. The problem is that
when the Reset-on-Match releases the held Node Down, nvcorrd immediately
proceeds to the next node for it, the Pass-on-Match. But there is nothing
for it to match because the Node Up event, which triggered the Reset, has
not yet been stored in the cache for the Pass-on-Match. That processing
will come after the Node Down is released. In short, this is a timing issue.
So what can you do? You have to insert another step between the Reset and
the Pass, one that gives nvcorrd time to finish processing the Node Down,
for now, so that he can go back and store the Node Up. I'm indebted to
my colleague, Paul Stroud, for one workable solution, which he was the
first to think of. Insert another Reset-on-Match after the first one and
connect the Node Down as Input One. Set the interval for anywhere from
30 seconds to a minute. And connect the output to the Pass-on-Match as
Input One for it. The trick is to have nothing connected to that second
Reset-On-Match as Input Two. That way the Node Down event will just be
held for the interval and then released. Once the Node Down is stored in
the cache for the second Reset, processing for it ceases, and nvcorrd can
then go back to the Node Up and store it in the Pass.
The only catch to this is that the triggering Node Up is sent to TEC about
a minute later than the Node Down, but that should not matter much to your
TEC rules, since they can evaluate what's already in the reception store
as well as the current event.
Incidentally, that's why the IBM direction was to do this kind of correlation
in TEC, where timing issues were not as relevant.
Hope this helps
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
|
|
|