nv-l

Notes on Ruleset Performance

1998-07-31 16:16:33
Subject: Notes on Ruleset Performance
From: James_Shanks AT TIVOLI DOT COM
To: nv-l AT lists.tivoli DOT com
Date: Fri, 31 Jul 1998 16:16:33 -0400
Notes on Ruleset Performance


This document is an attempt to bring together several hints on ruleset

coding which seem to be needed to help users who have performance

problems with rulesets.


The key thing to remember is that the ruleset editor is a

programming tool, and with it you can write very powerful programs, but

you can also write resource hogs too.  And, since ruleset processing is

done by just one daemon, nvcorrd, you can quickly bring all of event

processing to its knees with a bad ruleset.


Here are some hints.


1. Limit the input as quickly and dramatically as you can

   Typically this is done by using the Trap Settings node (where you

can specify up to 20 specific traps per enterprise) or the Event

Attributes node, so that you can immediately reduce the processing load

to a small subset of all the traps which pass through the system.

Remember that nvcorrd gets a copy of every one of them -- even those

marked "Log Only" or "Don't Log or Display" -- so it is vitally

important to limit the volume of what is processed.  Failure to do so

will result in a general slowdown of events coming to the display window

and you will see messages in the nvcorrd.alog  and .blog that traps are

being queued.  With nvcdebug -d all you can actually see how many are

in the queue.  What this means is that nvcorrd is falling behind and

may never catch up.



2. Limit your use of Collection compares and Database field compares

   Never start your ruleset with a Query Database node, whether for

collection membership or for a database field value.  To do so means

that every trap in the system will have to be checked (see hint #1)

and worse yet, that nvcorrd will have to suspend other processing

while awaiting the response from an external daemon (nvcold for collec-

tions, ovwdb for database fields) to decide what to do next.  These

external calls should be kept to a minimum.  Likewise, it is not good

to string these calls together (multiple nodes which query collections

or database fields) for the same reason.  Every one suspends nvcorrd

while he awaits an answer.  Collections can be combined into  a

super collection using the Collection editor so that only one call has

to be made.  Try not to query a database field if you are also going to

set it, since each is another call to ovwdb.



3. Limit MIB compares and sets

  The same kinds of considerations that apply to collections and fields,

also apply to getting (and setting) MIB variables.  This should be

done sparingly. Now we not only suspend processing for a response from

another process (snmpd), but we have to go outside our own box and

that may introduce network delays as well.  The retry count for a

MIB compare should be cut to one, if possible; especially if the node

you are trying to reach may be down, and the timeout values kept low.



4. Use checkroute sparingly

  Checkroute is also an external call so make sure that you only use it

when you have to and never retry more than once.



5. Use trap variables for information wherever you can.

  The entire trap is passed to nvcorrd and is immediately parsed into

the trap variables.  These should used rather than database fields,

collection queries, or MIB compares, to make decisions, whenever

possible.  The reason should be obvious by now -- nvcorrd does not

have to rely on any eternal sources for this data and can process it

immediately.



6. Keep in-line actions short

   An in-line action is a user-specified command or script for nvcorrd

to execute to decide what to do next.  These should be the sorts of

things that complete in less than 10 seconds, because once again, all

other processing is suspended.  Activities like sending a page, an

email, or a pop-up message, should never be done in an in-line action.

They should be done in an (off-line) action node instead, so that they

are executed by actionsvr and not by nvcorrd.  For in-line actions,

the shorter, the better.  Don't wait even 30 seconds for output that

should come back in two; wait ten seconds instead.



7. Make hold times for pass-on-match and reset-on-match reasonable

   The pass-on-match and reset-on-match functions offer great power in

a ruleset, but there is a trade-off in memory and cpu for this power.

These nodes create a cache in memory and store events in that cache

for comparison with others which arrive at some later time.  If you

specify hours of hold time, and the events you are caching are

frequent, then you could see dramatic memory growth in nvcorrd over

time.  Nothing can be done about this, if that is what you have coded.

Similarly, although not a performance consideration, there is a lower

limit to time discrimination here.  These match nodes use a "heartbeat"

mechanism to  check whether a the timeout value of their cache has been

reached for the events in it.  This is fixed at every 15 seconds.  No

time value for the cache can be resolved any finer than this.  In real

life this is typically not a problem, since matching events seldom

occur any faster than this, but it a design consideration to be aware

of.




8. Never use "PASS" or "Forward" in a ruleset in ESE.automation.

  Rulesets whose full path names are placed in the /usr/OV/conf/ESE.

automation file are registered by the actionsvr daemon when he is

started.  Thus, when the initial event stream node says "PASS" then

a complete copy of every event in the system is sent to him.  But he

does not have a display on which to place these passed events

(actionsvr is a daemon and daemons do not have a display), so they

instead sit on his incoming queue.  Actionsvr has no way to de-queue

them as they are not accompanied by actions he is to perform.

Eventually his queue fills up (about 32K events) and he stops processing.

Then the events start to back up inside nvcorrd, and when his outbound

queue fills up, he too stops processing, and all events processing ceases.

The same thing will happen, of course, when a Forward node is used in

a ruleset executing in the background for actionsvr.  It will just occur

more slowly.





It is hoped that these hints will assist the user in creating more

effective rulesets, with minimal adverse impacts to their systems.



James Shanks

Tivoli (NetView for AIX) L3 Support

Last updated: July 6, 1998

<Prev in Thread] Current Thread [Next in Thread>
  • Notes on Ruleset Performance, James_Shanks <=