Re: up/down ruleset

1998-08-17 11:29:45
Subject: Re: up/down ruleset
From: Tim Clark <Tim.Clark AT TAVVE DOT COM>
To: nv-l AT lists.tivoli DOT com
Date: Mon, 17 Aug 1998 11:29:45 -0400
Check www.tavve.com/nmsweb

Check for reboots reports are there for your review.

-----Original Message-----
From: Netview Operator <netview AT NV1.HSNET.UFL DOT EDU>
Date: Monday, August 17, 1998 11:24 AM
Subject: Re: up/down ruleset

>Hey Greg-
>Not sure whether you are after real time notification so you can do
>about it or you just want to know about reboots (and finding out after the
>rebootee is back up is okay).  If the latter, a data collection on
>mode: Don't Store, Check Threshholds
>polling interval: 3m
>Trap number: 58720263
>Threshold: 180000
>source: <I used a wild card to match any node on our network and we only
>"manage" nodes of "interest" so we're not querying every IP address>
>rearm: 179999
>rearm event...
>Event Log Message: $3
>Popup notification (doesn't work): $2 Rebooted or Power-Failure
>Command for Automatic Action:(echo Sysuptime under 3 minutes at; date ;echo
>$2 was it rebooted?) | /usr/bin/mail -s 'Sysuptime under 3 min $2' netmgrs
>so the email alias netmgrs gets email whenever the system uptime on a
>device falls below 3 minutes causing rearm of the data collection
>Sysuptime under 3 minutes is pretty much a guarantee the device restarted
in the
>last three minutes and is more reliable than coldstart traps which may not
>it to Netview anyway.
>Hope this helps.
>Randy Martin
>Shands Healthcare
>martirw AT is1.hsnet.ufl DOT edu
>You wrote:
>> I need a ruleset that detects when a node has gone up AND down 3 times
>> in 30 minutes.  I'm looking for catching the condition whereby a
>> router reboots itself.  I'm close, but I'm missing some logic which
>> I'm not sure how to apply within a ruleset.
>> The trick is catching the pattern:  node down -> node up -> node down
>> -> node up -> node down -> node up
>> I thought I was clever at first, by just looking for receiving 3 node
>> events in 30 minutes.  This didn't work because, for example, we have a
>> router with several serial interfaces on it for our remote sites.  One
>> thunderstorm on the Eastern Plains of Colorado and those nodes typically
>> "disappear" for a while (lightning and those remote 56K lines don't get
>> along so well <smile>).  The problem is that my ruleset checks for 3
>> interfaces down signals in 30 minutes from the "origin" attribute.  Well,
>> I learned that if an interface with 2 IP addresses configured on it goes
>> down, I'll get 3 interface down traps each time it goes down: 2 for each
>> network that's down on the interface and one from the router indicating
>> that it has a down interface.  The problem is that these 3 traps all
>> the same "origin" attribute and will satisfy the ruleset.
>> You can't just check for 3 ups and 3 downs because you have the same
>> problem.  What I really need is to ensure that I get those traps
>> in the order of down,up,down,up,down,up and only then will I page
>> out the problem.  I need to create a ruleset where the order
>> of the traps over a period of time matters and I don't understand
>> how to do this???
>> Thank you --Greg Redder
>>             Network Analyst
>>             Colorado State University
>> Greg Redder                         Academic Computing & Networking
>> Colorado State University, ACNS     Phone:(970)491-7222  FAX:
>> 601 S. Howes, Room 625              E-mail: redder AT yuma.colostate DOT edu
>> Fort Collins, CO 80523              PGP

<Prev in Thread] Current Thread [Next in Thread>