Re: [nv-l] netfmt
2004-05-10 17:09:00
Mahesh,
All that's in your nettl log are messages
from other processes, ipmap, ovw and ovspmd. There is nothing from
nettl itself., and nothing to indicate that the nettl process had a problem.
See where it says "Software:"? That's how you can
tell what process wrote the message.
So the nettl log itself doesn't look
promising, but you should let someone else from Support look for you.
The ps output may tell us more.
This kind of output is normal. it
is what you should see:
root 2471 1 0
15:49 ? 00:00:00 /usr/OV/bin/ntl_reader 0
1 1 1 1
root 2472 2471 0 15:49 ?
00:00:00 netfmt -CF
root 8018 9132 0 15:53 pts/0 00:00:00
grep 2471
Notice how the parent process of the
netfmt -CF (2471) is the ntl_reader process?
In the earlier cases, the parent process
is 1, which means that the nettl process, the ntl_reader, which spawned
them, has itself gone away and the netfmt then inherits the init process
(1) as its parent. , since it has no parent left in the system. These
are all orphans.
root 28067 1 0 15:45
? 00:00:00 netfmt -CF
root 23113 1 0 15:48 ?
00:00:00 netfmt -CF
root 23748 1 0 15:48 ?
00:00:00 netfmt -CF
root 24536 1 0 15:48 ?
00:00:00 netfmt -CF
This situation might indicate that the
ntl_reader process is coring on your box. Can you find any core files
in the root (/) directory? Or in /usr/OV?
I don't believe that ntl-reader
is setup to use /usr/OV/PD /cores.
In any case, open a problem to Support,
and let them help you gather some data.. I have no idea what else
to tell you.
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
Mahesh Tailor <mahesh.tailor AT network.carilion DOT com>
Sent by: owner-nv-l AT lists.us.ibm DOT com
05/10/2004 04:13 PM
|
To
| NetView User List <nv-l AT lists.us.ibm DOT com>
|
cc
|
|
Subject
| Re: [nv-l] netfmt |
|
Hi, James!
Here's the output of my ps -ef:
root@netview [/usr/OV/log] # ps -ef | grep netfmt
root 28067 1 0 15:45 ?
00:00:00 netfmt -CF
root 23113 1 0 15:48 ?
00:00:00 netfmt -CF
root 23748 1 0 15:48 ?
00:00:00 netfmt -CF
root 24536 1 0 15:48 ?
00:00:00 netfmt -CF
root 2472 2471 0 15:49 ?
00:00:00 netfmt -CF
root 8020 9132 0 15:53 pts/0 00:00:00
grep netfmt
root@netview [/usr/OV/log] # ps -ef | grep 2471
root 2471 1 0 15:49 ?
00:00:00 /usr/OV/bin/ntl_reader 0
1 1 1 1
root 2472 2471 0 15:49 ?
00:00:00 netfmt -CF
root 8018 9132 0 15:53 pts/0 00:00:00
grep 2471
And, these are since I had to restart my machine 50-minutes ago.
I performed a nettl -stop and still had the netfmt processes belonging
to PID 1 running; killed them. Restarted nettl.
Here're some of the nettl log messages . . .
************************************ NetView
*******************************@#%
Timestamp : Mon May 10
2004 10:06:07.308834
Process ID : 9774
Subsystem :
SECURITY
User ID ( UID ) : 0
Log Class :
ERROR
Device ID : -1
Path ID
: -1
Connection ID : -1
Log Instance : 0
Software : /usr/OV/bin/ovw
Hostname : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OVwUserSecurity() error 4 on waitpid
************************************ NetView
*******************************@#%
Timestamp : Mon May 10
2004 15:08:45.118009
Process ID : 1609
Subsystem :
OVW
User ID ( UID ) : 0
Log Class :
ERROR
Device ID : -1
Path ID
: -1
Connection ID : -1
Log Instance : 0
Software : /usr/OV/bin/ipmap
Hostname : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IPMap error in symbolMgr::flushSymbols - OVwCreateSymbols - (OVwError =
80): Object not found.
************************************ NetView
*******************************@#%
Timestamp : Mon May 10
2004 15:08:45.118101
Process ID : 1609
Subsystem :
OVW
User ID ( UID ) : 0
Log Class :
ERROR
Device ID : -1
Path ID
: -1
Connection ID : -1
Log Instance : 0
Software : /usr/OV/bin/ipmap
Hostname : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Failed to create symbol: 172.23.6.25. OVwError =80: Object not found.
************************************ NetView
*******************************@#%
Timestamp : Mon May 10
2004 15:08:45.118763
Process ID : 1609
Subsystem :
OVW
User ID ( UID ) : 0
Log Class :
ERROR
Device ID : -1
Path ID
: -1
Connection ID : -1
Log Instance : 0
Software : /usr/OV/bin/ipmap
Hostname : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IPMap error in symbolMgr::flushSymbols - OVwCreateSymbols - (OVwError =
80): Object not found.
************************************ NetView
*******************************@#%
Timestamp : Mon May 10
2004 15:08:45.118822
Process ID : 1609
Subsystem :
OVW
User ID ( UID ) : 0
Log Class :
ERROR
Device ID : -1
Path ID
: -1
Connection ID : -1
Log Instance : 0
Software : /usr/OV/bin/ipmap
Hostname : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Failed to create symbol: 10.10.10.10. OVwError =80: Object not found.
************************************ NetView
*******************************@#%
Timestamp : Mon May 10
2004 15:17:38.349803
Process ID : 1394
Subsystem :
OVS
User ID ( UID ) : 0
Log Class :
ERROR
Device ID : -1
Path ID
: -1
Connection ID : -1
Log Instance : 0
Software : /usr/OV/bin/ovspmd
Hostname : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Object manager kronos.carilion.com is not registered. See ovaddobj(1m).
Kronos.carilion.com is 10.10.10.10 which is a Win2K cluster address and
is excluded as !kronos.carilion.com in netmon.seed.
If you see something obvious can you please drop me a reply. If not,
I
will submit a PMR.
Thanks.
Mahesh
On Mon, 2004-05-10 at 15:41, James Shanks wrote:
> Well, I don't have a clue what is wrong, but on Linux, it is the nettl
> process itself which spawns the netfmt -CF. But only one of
those is
> spawned on my system and it stays active only so long as nettl is
> active. When I do a "/usr/OV/bin/nettl -stop"
both nettl and the
> netfmt go away.
>
> You should be able to chase ownership of the process via ps -ef.
Who
> is | are the parents of these rogue netfmts? Your current nettl
or
> some other long gone? What happens when or if you do nettl -stop?
> Once the main nettl goes away, you should be able to kill those netfmt
> processes with impunity, though that will not tell you why they are
> being created. But you can stop and restart nettl any time you
wish.
> Normally it is just started once and keeps running until stopped.
If
> you stop nettl and kill all the remaining netfmts, if any, and then
> restart nettl with nettl -start, try looking with "ps -ef |grep
> netfmt". How many do you see? Should be just one. Try
looking again
> every few minutes.
>
> Offhand I see nothing in your status that looks out of line. Where
> would you look for a source of the problem? Well, I'm not sure,
since
> I've never seen anything like this before, but here's what I'd do:
> (1) /usr/OV/bin/nettl -stop
> (2) ps -ef | grep netfmt. kill any you find
> (3) cd /usr/OV/log
> (4) ls nettl* and see how many you have, just netttl.LOG00
or also
> nettl.LOG01
> (5) for each nettl.LOG0n you have, issue
> /usr/OV/bin/netfmt -f nettl.LOG0n
> formatted.LOG0n
> This creates ascii files you can read.
> (6) Look in the formatted logs for interesting error messages
> (7) Call Support with what you find.
>
> James Shanks
> Level 3 Support for Tivoli NetView for UNIX and Windows
> Tivoli Software / IBM Software Group
>
>
> Mahesh Tailor
> <mahesh.tailor AT network.carilion DOT com>
> Sent by:
> owner-nv-l AT lists.us.ibm DOT com
>
> 05/10/2004 03:01 PM
> Please respond to
> nv-l
> To
> NetView User List
> <nv-l AT lists.us.ibm DOT com>
> cc
>
> Subject
> [nv-l] netfmt
>
>
>
>
> Hi!
>
> Running NetView 7.1.3 fp 2 on RedHat Linux AS 2.1.
>
> I am having a problem with hundreds of netfmt -CF processes running
> and
> eventually disabling the system because of too many open files [system
> default open files has been set to 32K files]. How can I figure
out
> what is causing all these processes to start? Here's my nettl
status
> output:
>
> Logging Information:
> Log Filename:
/usr/OV/log/nettl.LOG0x
> User's ID: 0
Buffer Size: 8192
> Messages Dropped: 0 Messages
Queued: 0
>
> Subsystem Name:
Log Class:
> NON_IP
ERROR
> DISASTER
> DISTMAN
WARNING ERROR
> DISASTER
> SECURITY
WARNING ERROR
> DISASTER
> COLLECTION
WARNING ERROR
> DISASTER
> SNMP
ERROR
> DISASTER
> CMOT
ERROR
> DISASTER
> OVE
ERROR
> DISASTER
> OVC
ERROR
> DISASTER
> OVW
ERROR
> DISASTER
> OVD
ERROR
> DISASTER
> OVS
INFORMATIVE
ERROR
> DISASTER
> OVCAPI
ERROR
> DISASTER
> OVEXTERNAL
ERROR
> DISASTER
> OVWAPI
ERROR
> DISASTER
> TEST_ID_1
> DISASTER
> TEST_ID_2
> DISASTER
> FORMATTER
> DISASTER
>
>
> Tracing Information:
>
> Trace Filename:
> No Subsystems Active
>
>
> In addition to NetView the server also has the following running:
>
> - MySQL DB
> - Apache w/PHP and Perl.
> - Some ksh scripts that perform /usr/OV/bin/nvUtil on various
> smartsets
> once every 30-minutes.
>
> That is essentially it.
>
> Also, what does the netfmt -C option do? It is not in the man
page.
>
> Thanks.
>
> Mahesh
--
Mahesh Tailor
WAN/TSM/NetView Administrator
Carilion Health System
Information Services
37 Reserve Avenue
Roanoke, VA 24016
Phone: 540.224.3929
Fax: 540.224.3954
|
|
|