RE: [nv-l] nvtecia still hanging or falling behind processing TEC _ITS.rs
2004-09-15 13:31:15
To figure out what is wrong you have
to answer the question, "How far do we get?"
Is nvserverd is running? ovstatus
nvserverd. Are events going to your cache file? The
default location is /etc/Tivoli/tec/cache. If it's growing with new
events, then the adapter cannot has lost contact with the server.
You aren't getting an nvserverd.log
file? Never seen that before if you are running the executable which
came with IY60528.
But you could look for TEC adapter errors
in nettl. You have to format it first.
To do that you would have use "netfmt
-f nettl.LOG00 > formatted.nettl.LOG00" and then do the same
for LOG01, and go looking for nvserverd entries. Some of them will
be cryptic, but the one you would want would say something about a tec_create_handle
failure. Prior to 7.1.4, that was the only place you could find adapter
errors.
Another thing you should do is try running
the nvcorrd trace and see whether he has a forwardall.rs ruleset registered
for nvserverd. Issue "nvcdebug -n" and then "nvcdebug -d
all" and go look at the nvcorrd logs. You should see the current
list of ruleset being run (nvcdebug -n) and then incoming events being
processed for forwardall.rs. When he processes them he writes
a message to the log which says he is forwarding the notification to appl
<pid>. Check the <pid>. It should be the process
id (pid) for nvserverd.
Finally, you might try using the non-TME
adapter just as a test and see whether that works. But remember,
they use different executables. So for that you'd have to go back
through serversetup and reconfigure the adapter so that the right daemon
gets registered in ovsuf, and then you'd have to stop it and modify the
tecint.conf file to enable the tracing again, because the reconfigure will
wipe it out.
HTH
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Edwards, JT - ESM"
<JEdwards3 AT wm DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/15/2004 12:04 PM
|
To
| "'nv-l AT lists.us.ibm DOT com'"
<nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| RE: [nv-l] nvtecia still
hanging or falling behind processing TEC _ITS.rs |
|
Well we here at Waste Management
are still hanging issues getting events to flow to TEC.
We are at 7.1.4 FP 01 with IY60528
patch installed.
I have no tracing and no signs
that the nvtecia process (or subprocess) is even working. Our rules (forwardall.rs)
is set on pass. We have stopped and restarted the nvserverd process several
times.
The tecint.conf file reads as
follows
ServerLocation=@EventServer
TecRuleName=forwardall.rs
ServerPort=0
DefaultEventClass=TEC_ITS_BASE
Type=LCF
BufferEvents=YES
UseStateCorrelation=YES
StateCorrelationConfigURL=file:///usr/OV/conf/nvsbcrule.xml
## The following four lines are for debugging the state correlation engine
LogLevel=ALL
TraceLevel=ALL
LogFileName=/usr/OV/log/adptlog.out
TraceFileName=/usr/OV/log/adpttrc.out
## The following three lines alter nvserverd default behavior
NvserverdTraceTecEvents=YES
NvserverdPrimeTecEvents=NO
NvserverdSendSeverityTecEvents=YES
LCFINSTANCE=1
The two logfiles are not being
created.
ummmm HELP?!
JT
-----Original Message-----
From: owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com]On
Behalf Of James Shanks
Sent: Tuesday, September 14, 2004 10:11 AM
To: nv-l AT lists.us.ibm DOT com
Subject: Re: [nv-l] nvtecia still hanging or falling behind processing
TEC_ITS.rs
I'm not aware of anyone else reporting a similar problem. Historically,
however, the adapter has always been load sensitive.
But let's clarify the issue a bit, shall we? Are you saying that
the adapter slows down or that it hangs? Does the heartbeat
event get there eventually? How slow is it? Do things ever
recover without your taking everything down or not? How long does
that take? How big is this trap surge you are talking about?
There is no simple way to diagnose this issue because there is the ZCE
engine in the middle, as well as the fact that nvserverd has no idea what's
going on after he does tec_put_event. As far as NetView is concerned,
once that occurs, the event has been sent. Whether it gets to the
server or not is the responsibility of the code in the TEC EEIF library.
You can use the conf file entry NvserverdTraceTecEvents=YES, or the corresponding
environment variable, to get an nvserverd.log, to see whether nvserverd
has given the event to the adapter in a timely fashion. Then you
would have to check the adapter's cache file, by default /etc/Tivoli/tec/cache,
and see whether it is caching events. It will do that if communications
with the server hiccup. But it should recover from that automatically.
When communication is lost, it tries again on every subsequent event.
If the cache isn't growing, and nvserverd has logged the event,
then the problem is internal to the TEC code. To go deeper, you'd
have to get the TEC folks involved.
They might want you to get the java adapter traces mentioned in the conf
file, or they might want a trace of the internals of the adapter library.
For that you'd have to obtain a special diagnosis file from
them, called ".ed_diag_conf" to hook that in by a special
entry in the conf file. But then they'd have to read the traces.
And all that would require that you open a call to Support.
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Van Order, Drew \(US
- Hermitage\)" <dvanorder AT deloitte DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/14/2004 10:22 AM
|
To
| <nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| [nv-l] nvtecia still hanging
or falling behind processing TEC_ITS.rs |
|
Hi all,
After patching 7.1.4 FP01 with the latest efix to fix nvcorrd/nvtecia hanging
or stalling, we find it's still happening. It mainly starts when we get
a surge of Cisco syslog traps from devices. The only piece not keeping
up is the NV to TEC integration; demandpolls are fine and events are moving
in the Event Browser. TEC_ITS only passes traps on, we do no other processing
in the ruleset. TEC events from sources outside NV are not impacted. We
send an hourly Interface Down trap via cron to serve as a heartbeat. When
it misses the second one in a row (as seen at TEC), we cycle NV and it's
OK again. MLM is not an option for our environment. Is anyone else struggling
with this?
Thanks--Drew
*Disclaimer:*
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law.
If you are not the intended recipient, you should delete this message.
Any disclosure, copying, or distribution of this message, or the taking
of any action based on it, is strictly prohibited.
|
|
|