nv-l

Re: [nv-l] Netview Rule - corrected

2004-10-22 03:44:02
Subject: Re: [nv-l] Netview Rule - corrected
From: wolfgang.bergbauer AT attglobal DOT net
To: nv-l AT lists.us.ibm DOT com
Date: Fri, 22 Oct 2004 09:43:15 +0200

Hi James,

Awesome, this is the first time I think I understand this nvcorrd behaviour, hopefully that stays like this, and thanks for the phrase that I am still thinking like a human being - after all these long hours with these stupid machines :-)

In regards to the rule, I have extended your suggestion and added a 6 minutes reset on match, which gives any nodes time to come up again without sending any events to TEC.
I oultined this in a small slide and posted it on the Netview Web page. Any comments for improvements are welcome. But what I can see from my lab testing, it works like expected.

Link:    

http://www.nv-l.org/twiki/bin/view/Netview/NetViewRule

Kind regards and thanks again to James,

Wolfgang


------------------------------------------------------------------
Wolfgang Bergbauer
Network and System Management Consultant
Cell phone: +49 172 534 9131
E-mail: wolfgang.bergbauer AT attglobal DOT net



James Shanks <jshanks AT us.ibm DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com

10/21/2004 12:06 AM
Please respond to
nv-l AT lists.us.ibm DOT com

To
nv-l AT lists.us.ibm DOT com
cc
Subject
Re: [nv-l] Netview Rule - corrected





When you have a ruleset problem, the thing to do is to turn on the nvcorrd trace and see why what you expected to happen is not happening. You do that with "nvcdebug -d all" and the results are written to the nvcorrd.alog and blog. When a trap is received, you'll see the eye-catcher "Received a trap" and when nvcorrd is finished with it, you'll see "Finished with the trap", Everything that happens in between is the processing nvcorrd did on it.

That said, I think I already know what you are going to find. You are looking at the ruleset as a human being would, as a purely logical construct, not as a computer algorithm which executes in real time. The problem is that when the Reset-on-Match releases the held Node Down, nvcorrd immediately proceeds to the next node for it, the Pass-on-Match. But there is nothing for it to match because the Node Up event, which triggered the Reset, has not yet been stored in the cache for the Pass-on-Match. That processing will come after the Node Down is released. In short, this is a timing issue.

So what can you do? You have to insert another step between the Reset and the Pass, one that gives nvcorrd time to finish processing the Node Down, for now, so that he can go back and store the Node Up. I'm indebted to my colleague, Paul Stroud, for one workable solution, which he was the first to think of. Insert another Reset-on-Match after the first one and connect the Node Down as Input One. Set the interval for anywhere from 30 seconds to a minute. And connect the output to the Pass-on-Match as Input One for it. The trick is to have nothing connected to that second Reset-On-Match as Input Two. That way the Node Down event will just be held for the interval and then released. Once the Node Down is stored in the cache for the second Reset, processing for it ceases, and nvcorrd can then go back to the Node Up and store it in the Pass.

The only catch to this is that the triggering Node Up is sent to TEC about a minute later than the Node Down, but that should not matter much to your TEC rules, since they can evaluate what's already in the reception store as well as the current event.

Incidentally, that's why the IBM direction was to do this kind of correlation in TEC, where timing issues were not as relevant.

Hope this helps

James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group

<Prev in Thread] Current Thread [Next in Thread>