nv-l

[nv-l] Loss of traps with MLM

2002-04-02 09:58:44
Subject: [nv-l] Loss of traps with MLM
From: Robin James <robin.james AT thalesatm DOT com>
To: NetView Discussion <nv-l AT lists.tivoli DOT com>
Date: Tue, 02 Apr 2002 15:58:44 +0100
We have been performing an experiment to determine if it is possible for
our Netview computer to lose locally generated traps. 

We use Netview 5.1.3 on Compaq TRU64 UNIX and we also run MLM on the
same machine to use its filtering capability. We have setup a filter to
throttle traps with the following settings:

smMlmFilterName[BlockTrapFlooding] =  "BlockTrapFlooding"
smMlmFilterState =  enabled
smMlmFilterDescription =  "Blocks traps when too many traps come from
the same host in a short time"
smMlmFilterAction =  throttleTraps
smMlmFilterAgentAddrExpression =  "cwps"
smMlmFilterThrottleType =  sendAfterN
smMlmFilterThrottleArmTrapCount =  20
smMlmFilterThrottleArmedCommand =  "/usr/sbin/Mlm_stop_snmpd.sh
$SM6K_TRAP_AGENT_ADDRESS"
smMlmFilterThrottleDisarmTimer =  "1s"
smMlmFilterThrottleDisarmTrapCount =  0
smMlmFilterThrottleDisarmedCommand =  "snmptrap -p 1675 localhost omc
.1.3.6.1.4.1.1254.1 `hostname` 6 104 1 .1.3.6.1.2.1.1.5 OctetStringASCII
$SM6K_TRAP_AGENT_ADDRESS"
smMlmFilterThrottleCriteria =  byNode
smMlmAliasName[cwps] =  "cwps"
smMlmAliasList =  "w1161,
w1162,
w2142"

As you can see from the settings an alias is also setup so that the
traps generated on the Netview node are not subject to the filter.

We set up 3 nodes to send traps repeatedly using the following script:

while 1
   send_event 803 "swamp test"
end

This gave approximately 2200 traps in the trapd log in one minute. Using
vmstat it could be seen that the Netview node had very little idle time.
We then used send_event on the Netview node to send single traps. We
observed that 1 out 4 events was not present in the trapd or midmand
logs.

This seems to confirm that a trap can be lost when the Netview node is
receiving a very heavy load of traps.

We also performed the same test by removing the use of MLM so traps go
directly to trapd and not via midmand. We performed the same test and
found no traps were lost.

It appears to me that the buffering between UNIX and trapd does not lose
the locally generated events but when MLM is filtering it is possible to
lose traps. 

Is it possible to find out the UDP receive buffer size with each
configuration?

I realise that the source of the problem is the node flooding our
Netview node with traps. We must stop this node from sending so many
events. We are trying to put a two part solution in place to ensure we
do not lose locally generated traps. The two parts are:

1. When MLM detects a node is flooding Netview with traps we will freeze
snmpd on that node so traps do not get sent.
2. Increase the buffer size between MLM and UNIX.

We think we know what to do for part 1 of the solution but is it
possible to increase the buffer size between midmand and UNIX for
receipt of traps? I know trapd provides an option to specify a UDP
receive buffer size but I can't see a similar option for midmand.

I would appreciate any comments or help on this problem.
Thanks

-- 
Robin
email: robin.james AT thalesatm DOT com
tel:   +44 (0) 1633-862020
fax:   +44 (0) 1633-868313

<Prev in Thread] Current Thread [Next in Thread>