RE: [nv-l] nvtecia still hanging or falling behind processing TEC _ITS.rs
2004-09-16 14:42:47
Drew,
You are asking the NetView guys about
TEC libraries, and the short answer is, we don't know what the source
of your problem is, or we'd tell you, and we'd fix it. In order to
get to the bottom of TEC library issues we have to get TEC people involved,
their Level 3 and development, because they haven't documented any cases
where this doesn't work. At least they have not told us about them.
The reason we know about errno 827 issues, for example, is because
they have been found before, both internally and externally, and we got
logs to see what the problem was so we could fix it. Ditto for the
memory leak in the TEC EEIF library that was originally shipped with NetView
7.1.4/ TEC 3.9. Somebody had to see the problem, document it, and
present that documentation to the folks who work on that code, in order
for a fix to be made.
But so far I don't know about any problems
or restrictions associated with running both an adapter and a TEC server
on the same physical box. That doesn't mean there aren't any. It
just means that none have been documented so far.
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Van Order, Drew \(US
- Hermitage\)" <dvanorder AT deloitte DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/16/2004 11:46 AM
|
To
| <nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| RE: [nv-l] nvtecia still
hanging or falling behind processing TEC
_ITS.rs |
|
I was wondering the same thing
JT--our NV and TEC coexist too. We flipped on nvserverd logging 2 days
ago but haven't had any failures yet. It's just a matter of time. Is there
a pattern to when your event flow stops? Mike and James, would the libraries
function mentioned here cause the intermittent behavior we are seeing?
I figured it would show as not getting events at all.
Nice to know we're not alone!
Thanks--Drew
-----Original Message-----
From: owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com]
On Behalf Of Edwards, JT - ESM
Sent: Wednesday, September 15, 2004 4:00 PM
To: 'nv-l AT lists.us.ibm DOT com'
Subject: RE: [nv-l] nvtecia still hanging or falling behind processing
TEC _ITS.rs
One other small thing.
The TEC server and Netview server
are co-located (on the same servers). Could that be our problem?
JT
-----Original Message-----
From: owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com]On
Behalf Of Mike Pearson
Sent: Wednesday, September 15, 2004 1:49 PM
To: nv-l AT lists.us.ibm DOT com
Subject: RE: [nv-l] nvtecia still hanging or falling behind processing
TEC _ITS.rs
JT:
I think that is a problem with the way your
netview is started. Try this. Ovstop then ovstop nvsecd and
then run /etc/netnmrc. Your problem is with libraries that are not
being available and if you call the /etc/netnmrc that should pick them
up.
Regards,
Michael Pearson
Tivoli NetView for UNIX and NT Support
Building 660, Office CC105B;
HWY. 54 & 600 PARK OFFICES DR
Research Triangle Park, N.C. 27709
(919) 254-2270
pearsom AT us.ibm DOT com
******************************************************************
******************************************************************
Need help with Tivoli Software Products?
Ask Tivoli!
http://www.tivoli.com/asktivoli
"Edwards, JT - ESM"
<JEdwards3 AT wm DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/15/2004 02:35 PM
|
To
| "'nv-l AT lists.us.ibm DOT com'"
<nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| RE: [nv-l] nvtecia still
hanging or falling behind processing TEC
_ITS.rs |
|
Jame and Jane. Found it:
************************************ NetView *******************************@#%
Timestamp : Wed
Sep 15 2004 13:34:20.493872
Process ID : 46230
Subsystem
: OVEXTERNAL
User ID ( UID ) : 0
Log Class
: ERROR
Device ID : -1
Path ID
: -1
Connection ID : -1
Log Instance
: 0
Software : /usr/OV/bin/nvserverd
Hostname : ausu066a.wm.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Call to tec_create_handle failed, tec_errno = 827
Now what do I do?
-----Original Message-----
From: owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com]On
Behalf Of James Shanks
Sent: Wednesday, September 15, 2004 12:07 PM
To: nv-l AT lists.us.ibm DOT com
Subject: RE: [nv-l] nvtecia still hanging or falling behind processing
TEC _ITS.rs
To figure out what is wrong you have to answer the question, "How
far do we get?"
Is nvserverd is running? ovstatus nvserverd. Are
events going to your cache file? The default location is /etc/Tivoli/tec/cache.
If it's growing with new events, then the adapter cannot has lost
contact with the server.
You aren't getting an nvserverd.log file? Never seen that
before if you are running the executable which came with IY60528.
But you could look for TEC adapter errors in nettl. You have to
format it first.
To do that you would have use "netfmt -f nettl.LOG00 >
formatted.nettl.LOG00" and then do the same for LOG01, and go
looking for nvserverd entries. Some of them will be cryptic,
but the one you would want would say something about a tec_create_handle
failure. Prior to 7.1.4, that was the only place you could
find adapter errors.
Another thing you should do is try running the nvcorrd trace and
see whether he has a forwardall.rs ruleset registered for nvserverd.
Issue "nvcdebug -n" and then "nvcdebug -d all" and
go look at the nvcorrd logs. You should see the current list
of ruleset being run (nvcdebug -n) and then incoming events being
processed for forwardall.rs. When he processes them
he writes a message to the log which says he is forwarding the notification
to appl <pid>. Check the <pid>. It should be the
process id (pid) for nvserverd.
Finally, you might try using the non-TME adapter just as a test and
see whether that works. But remember, they use different executables.
So for that you'd have to go back through serversetup and reconfigure
the adapter so that the right daemon gets registered in ovsuf, and
then you'd have to stop it and modify the tecint.conf file to enable
the tracing again, because the reconfigure will wipe it out.
HTH
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Edwards, JT - ESM" <JEdwards3 AT wm DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/15/2004 12:04 PM
|
To
| "'nv-l AT lists.us.ibm DOT com'" <nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| RE: [nv-l] nvtecia still hanging or falling behind
processing TEC _ITS.rs |
|
Well we here at Waste Management are still hanging issues getting
events to flow to TEC.
We are at 7.1.4 FP 01 with IY60528 patch installed.
I have no tracing and no signs that the nvtecia process (or subprocess)
is even working. Our rules (forwardall.rs) is set on pass. We have stopped
and restarted the nvserverd process several times.
The tecint.conf file reads as follows
ServerLocation=@EventServer
TecRuleName=forwardall.rs
ServerPort=0
DefaultEventClass=TEC_ITS_BASE
Type=LCF
BufferEvents=YES
UseStateCorrelation=YES
StateCorrelationConfigURL=file:///usr/OV/conf/nvsbcrule.xml
## The following four lines are for debugging the state correlation
engine
LogLevel=ALL
TraceLevel=ALL
LogFileName=/usr/OV/log/adptlog.out
TraceFileName=/usr/OV/log/adpttrc.out
## The following three lines alter nvserverd default behavior
NvserverdTraceTecEvents=YES
NvserverdPrimeTecEvents=NO
NvserverdSendSeverityTecEvents=YES
LCFINSTANCE=1
The two logfiles are not being created.
ummmm HELP?!
JT
-----Original Message-----
From: owner-nv-l AT lists.us.ibm DOT com [mailto:owner-nv-l AT lists.us.ibm DOT com]On
Behalf Of James Shanks
Sent: Tuesday, September 14, 2004 10:11 AM
To: nv-l AT lists.us.ibm DOT com
Subject: Re: [nv-l] nvtecia still hanging or falling behind processing
TEC_ITS.rs
I'm not aware of anyone else reporting a similar problem.
Historically, however, the adapter has always been load sensitive.
But let's clarify the issue a bit, shall we? Are you saying
that the adapter slows down or that it hangs? Does the
heartbeat event get there eventually? How slow is it? Do
things ever recover without your taking everything down or not? How
long does that take? How big is this trap surge you are talking
about?
There is no simple way to diagnose this issue because there is the
ZCE engine in the middle, as well as the fact that nvserverd has
no idea what's going on after he does tec_put_event. As far
as NetView is concerned, once that occurs, the event has been sent.
Whether it gets to the server or not is the responsibility of the code
in the TEC EEIF library. You can use the conf file entry NvserverdTraceTecEvents=YES,
or the corresponding environment variable, to get an nvserverd.log,
to see whether nvserverd has given the event to the adapter in a
timely fashion. Then you would have to check the adapter's
cache file, by default /etc/Tivoli/tec/cache, and see whether it
is caching events. It will do that if communications with the
server hiccup. But it should recover from that automatically.
When communication is lost, it tries again on every subsequent
event. If the cache isn't growing, and nvserverd has logged
the event, then the problem is internal to the TEC code. To
go deeper, you'd have to get the TEC folks involved.
They might want you to get the java adapter traces mentioned in the conf
file, or they might want a trace of the internals of the adapter library.
For that you'd have to obtain a special diagnosis file from
them, called ".ed_diag_conf" to hook that in by a
special entry in the conf file. But then they'd have
to read the traces. And all that would require that you
open a call to Support.
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Van Order, Drew \(US - Hermitage\)"
<dvanorder AT deloitte DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
09/14/2004 10:22 AM
|
To
| <nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| [nv-l] nvtecia still hanging or falling behind processing
TEC_ITS.rs |
|
Hi all,
After patching 7.1.4 FP01 with the latest efix to fix nvcorrd/nvtecia
hanging or stalling, we find it's still happening. It mainly starts
when we get a surge of Cisco syslog traps from devices. The only piece
not keeping up is the NV to TEC integration; demandpolls are fine
and events are moving in the Event Browser. TEC_ITS only passes traps
on, we do no other processing in the ruleset. TEC events from sources
outside NV are not impacted. We send an hourly Interface Down trap
via cron to serve as a heartbeat. When it misses the second one in
a row (as seen at TEC), we cycle NV and it's OK again. MLM is not
an option for our environment. Is anyone else struggling with this?
Thanks--Drew
*Disclaimer:*
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected
by law. If you are not the intended recipient, you should delete this
message. Any disclosure, copying, or distribution of this message, or the
taking of any action based on it, is strictly prohibited.
This message (including any attachments) contains confidential
information intended for a specific individual and purpose, and is protected
by law. If you are not the intended recipient, you should delete this message.
Any disclosure, copying, or distribution of this message, or the taking
of any action based on it, is strictly prohibited.
|
|
|