[nv-l] question: generating alerts for iferrors, discards, and utilizati

Netview_List:

We have been monitoring interface utilization, discards, and errors for years now, generating alerts into Tivoli TEC from netview when they go over threshold. We also track and graph them via mrtg/rrdtool.

Recently, I have been having an internal debate as to the merits of this strategy. I believe that it is useful to track all three and alert on them if they are over threshold. Others think that only misbehaving links are of interest (errors/discards), and utilization does not matter (is not actionable) unless the link is "broken/impaired". (I suppose it gets into how deeply one wants to react to possibly service affecting conditions)

we check in /out snmp variables every 10 min and alert as follows:

If% Discards >25 % of inbound packets discarded
If% Errors >20 % of inbound packets with errors
If% Util >95 % of packets received / bandwidth

Q: I was wondering what other people do for interface performance alerting? do they focus mostly on interface up/down? or node up/down?
if you are polling and thresholding, what values are you using? when do you consider a line to be sufficiently impaired that it time to call the carrier?

any comments appreciated

thanks

Don Mahler
Enterprise Management
SAIC/Telcordia

[nv-l] question: generating alerts for iferrors, discards, and utilization