Re: up/down ruleset

1998-08-17 11:22:17
Subject: Re: up/down ruleset
From: Netview Operator <netview AT NV1.HSNET.UFL DOT EDU>
To: nv-l AT lists.tivoli DOT com
Date: Mon, 17 Aug 1998 11:22:17 -0400
Hey Greg-

Not sure whether you are after real time notification so you can do something
about it or you just want to know about reboots (and finding out after the
rebootee is back up is okay).  If the latter, a data collection on sysUpTime

mode: Don't Store, Check Threshholds
polling interval: 3m
Trap number: 58720263
Threshold: 180000
source: <I used a wild card to match any node on our network and we only
"manage" nodes of "interest" so we're not querying every IP address>
rearm: 179999

rearm event...
Event Log Message: $3
Popup notification (doesn't work): $2 Rebooted or Power-Failure
Command for Automatic Action:(echo Sysuptime under 3 minutes at; date ;echo for
$2 was it rebooted?) | /usr/bin/mail -s 'Sysuptime under 3 min $2' netmgrs

so the email alias netmgrs gets email whenever the system uptime on a managed
device falls below 3 minutes causing rearm of the data collection threshold.
Sysuptime under 3 minutes is pretty much a guarantee the device restarted in the
last three minutes and is more reliable than coldstart traps which may not make
it to Netview anyway.

Hope this helps.

Randy Martin
Shands Healthcare
martirw AT is1.hsnet.ufl DOT edu

You wrote:

> I need a ruleset that detects when a node has gone up AND down 3 times
> in 30 minutes.  I'm looking for catching the condition whereby a
> router reboots itself.  I'm close, but I'm missing some logic which
> I'm not sure how to apply within a ruleset.
> The trick is catching the pattern:  node down -> node up -> node down
> -> node up -> node down -> node up
> I thought I was clever at first, by just looking for receiving 3 node down
> events in 30 minutes.  This didn't work because, for example, we have a
> router with several serial interfaces on it for our remote sites.  One
> thunderstorm on the Eastern Plains of Colorado and those nodes typically
> "disappear" for a while (lightning and those remote 56K lines don't get
> along so well <smile>).  The problem is that my ruleset checks for 3
> interfaces down signals in 30 minutes from the "origin" attribute.  Well,
> I learned that if an interface with 2 IP addresses configured on it goes
> down, I'll get 3 interface down traps each time it goes down: 2 for each
> network that's down on the interface and one from the router indicating
> that it has a down interface.  The problem is that these 3 traps all carry
> the same "origin" attribute and will satisfy the ruleset.
> You can't just check for 3 ups and 3 downs because you have the same
> problem.  What I really need is to ensure that I get those traps
> in the order of down,up,down,up,down,up and only then will I page
> out the problem.  I need to create a ruleset where the order
> of the traps over a period of time matters and I don't understand
> how to do this???
> Thank you --Greg Redder
>             Network Analyst
>             Colorado State University
> Greg Redder                         Academic Computing & Networking Services
> Colorado State University, ACNS     Phone:(970)491-7222  FAX:  (970)491-1958
> 601 S. Howes, Room 625              E-mail: redder AT yuma.colostate DOT edu
> Fort Collins, CO 80523              PGP

<Prev in Thread] Current Thread [Next in Thread>