Availability is hard to mange,I would like to know more about the management
of availability in the next version of netview and whether or not other
tools are required to manage and report on the data.
Does anyone know when the next version of netview is released and what other
feature it may have?
From: Ray Schafer [mailto:schafer AT TKG DOT COM]
Sent: Thursday, September 02, 1999 4:49 PM
To: NV-L AT UCSBVM.UCSB DOT EDU
Subject: Re: Availability
Using NetView Node Up/Node Down traps may not give you what you want. The
* Not all Node Up traps will have corresponding Node Down traps -
for routers. For example, an interface down trap on the router will
Node Marginal trap, and when the interface comes up again you'll get a
Up trap - without a Node Down.
* For routers that have administratively down interfaces (when an
adminsitrator manually brings the interface down on the router),
will never ever mark the router as down. Even if the router is under
* Network problems with the NetView server or the MLM's default router or
router in between you and the endpoint will cause NetView to mark the
or interface down (if it is polling it at the time) even though it is
notr down, just unpingable from the NetView server or MLM.
Now for the good news: This may be addressed in the next version of NetView
"snmp" polling is engaged. This will actually look at the uptime from the
tree of the devices MIB. You could probably write a script to do the same
now. If for every Node Up trap you get, you fire off an snmpget of
system.sysuptime (I think that's it - do "snmpwalk <node> system" to see!).
the uptime is just a few minutes than it is really an outage, if it is more
your polling cycle, than it is bogus. Be carefull though, if you fire off a
bunch of these snmpget's when you are flooded with up traps you could
Maybe you could use the snmpCollect facilities to attack the problem in a
efficient way: Set up a collection for your servers and another for your
routers. Create a MIB Expression to store the value "0 -
each member of the collection. I think that this is collected by
a counter - which means that it will report the difference between the last
sample and this one. The reason for the "0 - value" expression is because
snmpCollect only takes action when the variable or expression is greater
some number (in our case we are looking for this expression to be greater
0!). Create a specific trap for this threshold event, and as an action of
trap, run a command that will parse the trapd.log file looking for the
events (up/down/marginal) to get a closer approximation of when the node
down, and came back up! Collecting this once a day won't be overkill, and
your node goes down every day, this should work fairly well.
Rob Napholz wrote:
> Pham could you post your perl script to the group
> and save us all some time.
> thanks Rob
> Pham Isaak V wrote:
> > First create a ruleset to detect Node Up/Down traps, then compare the
> > to a collection of server or router.
> > If device is a router, log the event to a router logfile. If device is
> > server, log the event to a server logfile. The logs should contain the
> > following fields:
> > device name
> > status of device (up/down)
> > time of status change (day, hours, & minutes)
> > At the end of the month, run a script or program against the logfiles.
> > program or script (Perl in my case) to match the device down with its
> > corresponding device up. Now subtract the time of the device down trap
> > the device up trap. This will give you the length of time the devices
> > down. Convert the days and hours to minutes. Match up all the other
> > down/up trap associated with the same device. Add them all together and
> > should have the total number of minutes the device was down for the
> > Now, take the total number of minutes the device was down and subtract
> > 43200[(24 hours * 60 minutes) * 30 days = # of minutes in a month]. Take
> > that value and divide by 43200. This will give you the percentage of
> > availablity for the device.
> > This method is not 100% accurate, but it had to do for now. I hope
> > else have a better way of doing this.
> > Hint: This would be a great addition to the next release of NetView.
> > -----Original Message-----
> > From: Frantsen Christian [mailto:cf AT INTERNOC DOT SE]
> > Sent: Wednesday, September 01, 1999 6:00 AM
> > To: NV-L AT UCSBVM.ucsb DOT edu
> > Subject: Availability
> > Hi!
> > I would like to (with help from sysUptime) gather information and then
> > present this to a customer in single number. i.e
> > Your availability this month on these routers/servers/etc has been 99.7%
> > Has anyone made something like this? Perhaps someone gcould ive me a few
> > pointers on how to do this as easy as possible.
> > -----------------------------------------
> > Christian Frantsen
> > Technical Operations
> > Internoc Scandinavia AB
> > Tel: +46-36-194843
> > Fax: +46-36-194651
> > http://www.internoc.se
Ray Schafer | schafer AT tkg DOT com
The Kernel Group | Distributed Systems Management
|<Prev in Thread]
||[Next in Thread>|
- Availability, Frantsen Christian
- Re: Availability, Pham Isaak V
- Re: Availability, Boyles, Gary P
- Re: Availability, Todd E. Lewis
- Re: Availability, Rob Napholz
- Re: Availability, James Shanks
- Re: Availability, Mark Sklenarik
- Re: Availability, Ray Schafer
- Re: Availability, Fältman, Mikael
- Re: Availability,
Boulieris, Arthur <=