RE: [nv-l] Ruleset + up event.
2004-09-22 10:48:16
Mighty complicated set of Reset-on-match
and Pass-on-Match ruleset nodes back to back. Not sure I follow it
all.
But the key seems to be those scripts
at the end so far as I can see. Whatever happens, we pass them a
Node Down event originally and then later a matching Node Up. In
between we'll wait 3 minutes before sending the original Node Down, to
see whether this is a false alarm, and then we'll wait up to 144 hours
(6 days) for the matching Node Up. All of that is just to
launch the appropriate scripts. How do they work?
The intervening razzle-dazzle
of alternating Reset and Pass Nodes, the ones with the 5-second wait
intervals, seems to me to be some kind of timing mechanism to guarantee
that it will be possible to use the original Node Down as the match criteria,
for the much later Node Up. It delays the final handling of the
Node Down event so that there is time to save the Node Up in the
Pass-On-Match so that there will be something to match when the Node Down
is released. It's ingenious all right, and more than a little unusual,
as I don't recall seeing anything like it. Bet the original developer
would be surprised too.
But to my way of thinking, this way
has dependencies I don't care for. The biggest dependency I see
here is that if you have to stop and restart the daemons for any reason,
then the held events are lost. I'd prefer the setting and
querying of database fields myself, since you wouldn't be time or daemon
dependent. You could preserve continuity over a much longer time
period and you don't have to worry about keeping the daemons up, should
you have to recycle them for some other reason.
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Barr, Scott"
<Scott_Barr AT csgsystems DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/22/2004 09:47 AM
|
To
| <nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| RE: [nv-l] Ruleset + up event. |
|
Here is a ruleset that does
just what you need.
It’s got some extra stuff
in it because it processes a bunch of different groups node downs so you’ll
want to strip out the extra query smartsets and actions. Don’t try and
figure out how it works, I still don’t understand but a netview
guru helped me work it out and it works like 100 bucks.
1. Accepts
node up and node down traps
2. Holds
the node down for 3 minutes
3. If
the node up happens, pass the node up trap
4. If
the node up trap does not happen in 3 minutes, execute the notification
script
From: owner-nv-l AT lists.us.ibm DOT com
[mailto:owner-nv-l AT lists.us.ibm DOT com] On Behalf Of James Shanks
Sent: Wednesday, September 22, 2004 8:20 AM
To: nv-l AT lists.us.ibm DOT com
Subject: Re: [nv-l] Ruleset + up event.
What you see is what you get, Tom.
If you want to pass a Router Up event, then you have to do it explicitly.
The logic you are saying you want here is much more complicated
than just a simple reset-on-match. What you have just said is that
you want the Router Down held for just five minutes and then passed to
TEC if no Router Up. And then you want something to "remember"
that you passed this Router Down, and pass a matching Router up for time
period much later. Well, a simple ruleset cannot do that. So
you have to design something else more sophisticated.
When you design a custom ruleset for TEC, you and the TEC guy have to work
together. He can code rules on his end, just as you can. I
don't see why you cannot send all Router Up events to TEC as harmless and
let a TEC rule over there match them to any open Router Downs, and if there
are none them close them. Or let the operator close them. If
he sees them, then clearly there was no match so they no longer matter,
right?
If you have to do this in NetView, then I think you'd have to do something
like this. You have to keep a record somewhere of Router Down events
you sent to TEC, and query that list when a Router Up comes in. One
way them would be create a file, add the router name to it when you send
the event TEC (use an action node for that), and then query it in an inline
action script when the Router Up comes in, and if there is a match, then
delete the name from the list and send the Router Up. An alternative
would be to Set and Query Database fields on the router objects in the
database . You can create your own field or use CorrState1 - 4. You
set the field to indicate that you sent the trap, then you could query
it when the Router Up came in and take action that way. Then clear
the field. You get the idea, I'm sure
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
Tom Hallberg <gimli AT hhcrew DOT tk>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/22/2004 03:41 AM
|
To
| nv-l AT lists.us.ibm DOT com
|
cc
|
|
Subject
| [nv-l] Ruleset + up event. |
|
Hi
I got some ruleset design problem. For the moment I got first a "Trap
Settings" (for Router down events), then a "Inline Action"
to check that
its one of the routers I want to have status check on. After that I have
a
"Reset on Match" because I also take in Router up events so I
can reset on
match within 5 mins. But the problem is that if a router goes down, and
if
it have been down for more then 5 min then it will pass that down event
to
TEC. And let say now that the router when up again, so we got a Router
up
event. But that up event will not pass to our TEC. So are there any
Templets that can handle the problem about sending onlye one up event when
there have been a down event passed to TEC. Or do I have to make a new
Inline Action to take care about that up event that comes after 5 min?
The TEC guy dont want to have all up events. Because the net is quite big.
Thank you
//Tom
trap_unix_unreach.rs
Description: Binary data
|
|
|