Veritas-bu

Re: [Veritas-bu] NBU 5.1 MP5 all jobs hanging

2008-01-03 17:58:49
Subject: Re: [Veritas-bu] NBU 5.1 MP5 all jobs hanging
From: "Staub, Doug" <rstaub AT amgen DOT com>
To: Matthew Agle <rascal1981 AT gmail DOT com>, "Hudson, Steve" <Steve.Hudson AT ironmountain DOT com>
Date: Thu, 3 Jan 2008 14:43:20 -0800
Steve,
 
We have seen this issue extensively on Solaris 9/NBU 5.1 MPx - the resolution for us was a complex one involving making sure the /etc/system settings specify enough shared memory (we have ours at half of the physical RAM) and message queue sizes (we have ours set to "set msgsys:msginfo_msgmnb=524288", the highest Symantec recommends) and monitoring the 'ipcs' output on a five minute basis.  We found that we were effectively killing the scheduler at certain times during the week due to the sheer number of jobs set to start at any given time.  We have also encountered behavior where the scheduler completely missed (error code 196) a vast number of jobs and this pointed us to a bp.conf setting on our master (CLIENT_CONNECT_TIMEOUT was set to 3600, in combination with enough clients defined in backup policies that do not exist, causes the scheduler to hang/choke/freeze in such a manner that the number of 196s is shocking - once we put this back down to a setting of  "300", all was right with our world).
 
A fix for a memory leak?  I wish it was that simply for us - we keep getting the "upgrade to 6.5" to fix our problem and after the exhaustive experience with the scheduler in 5.1, I am convinced it is better in 6.5, just not sure what else might be worse....
 
-Doug

From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Matthew Agle
Sent: Sunday, December 30, 2007 12:06 PM
To: Hudson, Steve
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] NBU 5.1 MP5 all jobs hanging

I've seen that action with solaris 10/NBU5.1MP5.  We ended up finding a memory leak with the bpsched module, sent a memory dump to Veritas and they provided an update to us.  I suggest the next time it happens that you get a memory dump and send it in.

Matthew


On Dec 19, 2007 11:35 AM, Hudson, Steve <Steve.Hudson AT ironmountain DOT com> wrote:

We have seen at least 4 times in the last week where all Jobs Hang and it looks like BPSCHED goes away. We must then use the Kill -9 command on the Solaris 8 host to kill everything as the bp.kill_all and netbackup stop commands are ineffective. Anyone else seen this behavior in 5.1 MP5 ???

 

Steven R. Hudson

Sysadmin - Enterprise Storage

Iron Mountain

745 Atlantic Avenue

Boston MA 02111

Phone: (617) 535-2849

 

steve.hudson AT ironmountain DOT com

 


The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not he intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or coping or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message.


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu




--
Matthew C. Agle, MCSA
rascal1981 AT gmail DOT com

Define Trouble:  One hundred users standing up in their cubes asking "what happened" after you change a setting....
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Veritas-bu] NBU 5.1 MP5 all jobs hanging, Staub, Doug <=