nv-l

Re: [nv-l] ovstop hanging during nv6000_smit clear_topology_db_all

2003-06-03 10:06:54
Subject: Re: [nv-l] ovstop hanging during nv6000_smit clear_topology_db_all
From: JamesHorwath AT glic DOT com
To: jeff.ctr.vandenbussche AT faa DOT gov
Date: Tue, 3 Jun 2003 09:37:03 -0400

Jeff,

If you need to further investigate what is going on I would use lsof to interrogate what resources are causing the process to hang.   You can interrogate a process via lsof and list what resources are being consumed.  With a bit of tracing you should have a good idea of what has been corrupted and more importantly a fix.  I hope this helps.

Regards,
Jim

Jim Horwath
Guardian
IT Unix Services
610-807-8795


To:        nv-l AT lists.tivoli DOT com
cc:        
bcc:        

Subject:        Re: [nv-l] ovstop hanging during nv6000_smit clear_topology_db_all


James,

I had already done the ps while the ovstop was hung and the only daemons
left were nvsecd and ovspmd.  Clearing the topology was the first time I
noticed it.
I then restored the box from tape, and tried ovstop manually (w/o doing the
clear).  The first time I tried, it worked.  I then did an ovstart and then
ovstop, and this time it hung.  I will try this again to see which daemons
are running (I believe I checked and only nvsecd and ovspmd were running).

Thanks,

Jeff




                                                                                                         
                     James Shanks                                                                        
                     <jshanks AT us DOT ibm.c        To:       nv-l AT lists.tivoli DOT com                            
                     om>                      cc:                                                        
                                              Subject:  Re: [nv-l] ovstop hanging during nv6000_smit      
                     06/02/03 05:51 PM         clear_topology_db_all                                      
                                                                                                         
                                                                                                         




  Let's start over.   Basically I don't understand the whole deal, and I
think I've misled you.

First, clearing topology is something most people do once in a blue moon,
usually only while they are setting up the box, getting the seed file and
other options right, so why would you need to worry about this very often?
You should be well past that stage with NetView for AIX Version 4.1.   If
the issue is really ovstop, then why muddy the water with clearing
topology?  Or is that the only time you see it?  I'm confused.

I took a quick look at the nv600_script under the clear option.  Did you
trace it?  It doesn't do "ovstop nvsecd", which would be required to stop
both nvsecd and ovspmd.  It just does "ovstop", which means both of those
daemon are supposed to be up during the process.  You do know that, don't
you?  "ovstop <daemon>" takes down that daemon, as well as any who depend
on him (as defined in the ovsuf file).   Just plain "ovstop" without an
option will leave nvsecd and ovspmd both up.  "ovstop nvsecd" takes them
both down.   So  the script expects all daemons except those two to go
away.   By you killing ovspmd, the script is no longer waiting for the
ovstop to complete, but that doesn't mean it was ovspmd that was at fault.
He can't end the ovstop until all the other daemons (except nvsecd of
course) are down.  So you need to try some debug to see who is still
active.  After you killed ovspmd, what others are still there?  "ps -ef |
grep /usr/OV" should show you.  Only nvsecd is expected.  The others
should be gone.

My procedure when ovstop hangs is to cancel it, see who is still active
with ps -ef, and then try to ovstop them.  There really is no magic here.
If they won't go down with ovstop individually , then you have to use kill
-9 on the PID.  Then you might want to look in their logs or format the
nettl log to see if there are telltale messages about a problem.   As a
debug mechanism, rather than trying to ovstop everyone at once, you could
divide and conquer, eliminating suspects as you go.  For instance, you
could try moving the ovstop lower in the chain.  "ovstop trapd"   will
take down most of the well-behaved daemons.  Then "ovstop ovwdb".    Then
"ovstop pmd".    That should leave only nvsecd, ovspmd, and the
non-well-behaved ones.  These latter should come down rapidly when you do
"ovstop nvsecd".  Do they?    Basically, you have to try to pin down the
one that is holding up the works.


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group




jeff.ctr.vandenbussche AT faa DOT gov
06/02/2003 04:24 PM


       To:     nv-l AT lists.tivoli DOT com
       cc:
       Subject:        Re: [nv-l] ovstop hanging during nv6000_smit
clear_topology_db_all




James,

ovstop is hanging at times from the command line.  nvsecd is still
running,
so it looks like that is the daemon it is hanging on.  If I reboot the
box,
I can then ovstart/ovstop once or twice ok, but then ovstop hangs.

Any ideas/suggestions?

Thanks,

Jeff




                     James Shanks
                     <jshanks AT us DOT ibm.c        To: nv-l AT lists.tivoli DOT com

                     om>                      cc:
                                              Subject:  Re: [nv-l] ovstop
hanging during nv6000_smit
                     06/02/03 01:00 PM         clear_topology_db_all







Does it also hang from the command line, outside of your script?
I don't know why ovspmd should hang, unless he is still waiting for some
other daemon to stop (nvsecd has to come down first), or the message he's
waiting for got lost when you called this script inside your own script.
In any case, NetView 4.1 is too old even for me to find a good maintenance
history on.
But  nv6000_smit clear_topology_db_all is itself a script, and you can
trace it yourself if you haven't already.
 You don't even have to edit the script if you export
NV_TRACE=nv6000_smit  first.

James Shanks
Level 3 Support  for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group




jeff.ctr.vandenbussche AT faa DOT gov
06/02/2003 12:31 PM


       To:     nv-l AT lists.tivoli DOT com
       cc:
       Subject:        [nv-l] ovstop hanging during nv6000_smit
clear_topology_db_all



Has anyone ever had a problem running nv6000_smit clear_topology_db_all?
I am trying to run this from within a script, and it hangs during the
first
ovstop.  It looks like it is a problem with ovspmd.  If I kill -9 ovspmd,
then the clear continues, other the ovstop just sits there..

NV 4.1
AIX 4.1.5

I know these are dinosaurs, but it's what I have in the field at the
moment.

Any suggestions?

Thanks,

Jeff



---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe AT lists.tivoli DOT com
For additional commands, e-mail: nv-l-help AT lists.tivoli DOT com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)





---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe AT lists.tivoli DOT com
For additional commands, e-mail: nv-l-help AT lists.tivoli DOT com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)







---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe AT lists.tivoli DOT com
For additional commands, e-mail: nv-l-help AT lists.tivoli DOT com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)





---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe AT lists.tivoli DOT com
For additional commands, e-mail: nv-l-help AT lists.tivoli DOT com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)







---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe AT lists.tivoli DOT com
For additional commands, e-mail: nv-l-help AT lists.tivoli DOT com


*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)