nv-l

Re: Node/interface UP/Down Reset on Match

1998-11-09 17:33:15
Subject: Re: Node/interface UP/Down Reset on Match
From: "Joel A. Gerber" <joel.gerber AT USAA DOT COM>
To: nv-l AT lists.tivoli DOT com
Date: Mon, 9 Nov 1998 16:33:15 -0600
The polling interval should be set to whatever meets your monitoring
requirements (and capabilities of the platform).  If polling the
international links every 10 minutes (instead of 5) is often enough, or not
too often because you want to minimize management traffic on the WAN links,
that is your call.  The timeout/retries values need to be less than your
polling interval.  For example, a timeout/retries of 9/4 will give you a
total timeout of (9+18+36+72+144) 279 seconds which is a bit less than 5
minutes.  You could even go back to a polling interval of 5 minutes if you
wanted.

There is another thing to be aware of when "tuning" the timeout/retries
values.  The Interface Down and Up traps that result from the netmon polling
are also affected.  The values above of 9/4 means that the netmon poll will
take 279 seconds to time out.  This means that the resulting Interface Down
trap will be delayed at least 279 seconds after the interface actually "went
down" (could not be pinged).  There can be an additional delay of whatever
the polling interval is, which in your case is 10 minutes.  In other words,
if the interface went down right after a successful poll, then it will be
another 10 minutes plus the 279 second timeout until you get the Interface
Down trap.

        -----Original Message-----
        From:   Stoner, Raymond [SMTP:raymond.stoner AT SPCORP DOT COM]
        Sent:   Monday, November 09, 1998 14:38
        To:     NV-L AT UCSBVM.UCSB DOT EDU
        Subject:        Re: Node/interface UP/Down Reset on Match

        Joel & James, Thanks, I did not realize that would happen. So I have
        adjusted our International Links to 9 & 4. Our international links
seem
        to be behaving better. Our polling interval for these links is at 10
        minutes, global default is 5, should I leave this as is?

        -----Original Message-----
        From: Joel A. Gerber [mailto:joel.gerber AT usaa DOT com]
        Sent: Monday, November 09, 1998 2:42 PM
        To: NV-L AT UCSBVM.UCSB DOT EDU
        Subject: Re: Node/interface UP/Down Reset on Match


        James is right.  You need to be careful when increasing retries.
        Timeout/retries are not unique to the NetView application, but will
        simply
        control what happens at the lower TCP/IP layers in the protocol
stack.
        The
        most common implementation on all platforms is to double the timeout
        value
        for every retry which is exactly what AIX does.  A timeout/retry
        combination
        of 30/40 will result in total timeout of a million years!! (try the
math
        yourself: take 2 to the 40th power times 30 seconds).  You need to
be
        especially careful when increasing retries, but you should be
careful
        with
        the timeout value, too.  For example, changing the timeout from 1 to
10
        seconds with a retries of 5 means you increased the total timeout
from
        63
        seconds to 630 seconds.

        We use a global default of 5.0 second timeout and 3 retries.  For
        resources
        that need a longer timeout we use 9.0 seconds and 4 retries.

                -----Original Message-----
                From:   James_Shanks AT TIVOLI DOT COM
[SMTP:James_Shanks AT TIVOLI DOT COM]
                Sent:   Friday, November 06, 1998 15:14
                To:     NV-L AT UCSBVM.UCSB DOT EDU
                Subject:        Re: Node/interface UP/Down Reset on Match

                40 retries?  That cannot be right.  You should not increase
the
        retries
                like that.  It would mean that netmon would never be
finished
        with
        the
                polling cycle for this device.  The retry count is how many
        times
        netmon
                should try the device before he considers it down.    With a
        high
        timeout,
                he would still be waiting on timeouts from one cycle when it
is
        time
        to
                begin the next, which will lead to very starnge results.
Drop
        that
        back to
                where it was.    What you want is longer timeouts but few
        retries.

                There are sample rulsesets for Node Down/UP and Interface
        Down/UP.
        Have
                you looked at those?

                James Shanks
                Tivoli (NetView for UNIX) L3 Support



                "Stoner, Raymond" <raymond.stoner AT SPCORP DOT COM> on 11/06/98
        03:38:54
        PM

                Please respond to Discussion of IBM NetView and POLYCENTER
        Manager
        on
                      NetView <NV-L AT UCSBVM.UCSB DOT EDU>

                To:   NV-L AT UCSBVM.UCSB DOT EDU
                cc:    (bcc: James Shanks)
                Subject:  Re: Node/interface UP/Down Reset on Match





                I have changed and continue to increment the polling to
these
        devices as
                you suggested maybe my values are NG. I currently have
(just
        for
        these
                specific devices) timeout at 30 retries at 40 and Polling
        interval
        every
                10 minutes. We started @ 8 5 and 5.  I'll do some netmon
tracing
        on
                Monday.

                I probably do not have the rule structured properly.
(NetView
        rookie)
                Not quite sure how to match up the events.

                -----Original Message-----
                From: James_Shanks AT TIVOLI DOT COM
[mailto:James_Shanks AT TIVOLI DOT COM]
                Sent: Friday, November 06, 1998 2:49 PM
                To: NV-L AT UCSBVM.UCSB DOT EDU
                Subject: Re: Node/interface UP/Down Reset on Match


                Normally, I would recommend you look at polling intervals
and
        timeouts,
                since that controls what when netmon decides that an
interface
        is
        down
                and
                sends the traps.  I would suggest a separate entry in the
SNMP
                Configuration for these entries with a longer timeout.  If
        that's
        not
                working, perhaps you might try a netmon trace to see what is
        happening
                here.  If you need help with that, I'd call Support and ask
for
        it.

                The ruleset issue is more puzzling to me, because in
principle,
        this
        is
                just the sort of thing Pass/Reset-On-Match should do well.
The
        problem
                may
                be your timing however.  Ten seconds is way too fine an
        increment
        for
                the
                daemon to handle.  The heartbeat mechanism for checking the
        threshold is
                set at 15 seconds, so it would be impossible to get good
results
        lower
                than
                that.    Why not have him hold it for a minute or two?  Then
if
        there is
                going to be an UP event, you are sure not to miss it.

                James Shanks
                Tivoli (NetView for UNIX) L3 Support



                "Stoner, Raymond" <raymond.stoner AT SPCORP DOT COM> on 11/06/98
        02:02:14
        PM

                Please respond to Discussion of IBM NetView and POLYCENTER
        Manager
        on
                      NetView <NV-L AT UCSBVM.UCSB DOT EDU>

                To:   NV-L AT UCSBVM.UCSB DOT EDU
                cc:    (bcc: James Shanks)
                Subject:  Node/interface UP/Down Reset on Match





                Sometimes we receiving a Node/Interface Down event and a
second
        or
        two
                later he Node/Interface Up event is received, especially on
our
                International links. I have tried to adjust the timeout and
        retry
                intervals for these nodes but this problem still occurs. I
would
        like to
                hold the down messages for about ten seconds to see if the
up
        message is
                received, if not then forward the down event on to our T/EC
        console.
        A
                ruleset using the Reset on Match might be the way to go, but
I'm
        having
                trouble getting that to work. Any suggestions on dealing
with
        the
        rule
                or this situation is greatly appreciated.

                We are running NetView V4.1 on AIX 4.1.5

                Raymond Stoner
                Technical Advisor
                Schering Plough Corporation
                1011 Morris Ave. Union NJ 07083-7120
                Phone : (908)-820-6268 Fax : (908)-820-6102
                email: raymond.stoner AT spcorp DOT com
                iloviT

<Prev in Thread] Current Thread [Next in Thread>