Steve,
We have seen this issue extensively on Solaris 9/NBU 5.1
MPx - the resolution for us was a complex one involving making sure the
/etc/system settings specify enough shared memory (we have ours at half of the
physical RAM) and message queue sizes (we have ours set to "set
msgsys:msginfo_msgmnb=524288", the highest Symantec recommends) and monitoring
the 'ipcs' output on a five minute basis. We found that we were
effectively killing the scheduler at certain times during the week due to the
sheer number of jobs set to start at any given time. We have also
encountered behavior where the scheduler completely missed (error code 196) a
vast number of jobs and this pointed us to a bp.conf setting on our master
(CLIENT_CONNECT_TIMEOUT was set to 3600, in combination with enough clients
defined in backup policies that do not exist, causes the scheduler to
hang/choke/freeze in such a manner that the number of 196s is shocking - once we
put this back down to a setting of "300", all was right with our
world).
A fix for a memory leak? I wish it was that simply
for us - we keep getting the "upgrade to 6.5" to fix our problem and after the
exhaustive experience with the scheduler in 5.1, I am convinced it is better in
6.5, just not sure what else might be worse....
-Doug
From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Matthew
Agle Sent: Sunday, December 30, 2007 12:06 PM To: Hudson,
Steve Cc: veritas-bu AT mailman.eng.auburn DOT edu Subject: Re:
[Veritas-bu] NBU 5.1 MP5 all jobs hanging
I've seen that action with solaris 10/NBU5.1MP5. We ended up
finding a memory leak with the bpsched module, sent a memory dump to Veritas and
they provided an update to us. I suggest the next time it happens that you
get a memory dump and send it in.
Matthew
On Dec 19, 2007 11:35 AM, Hudson, Steve < Steve.Hudson AT ironmountain DOT com>
wrote:
We have seen at least 4 times in
the last week where all Jobs Hang and it looks like BPSCHED goes away. We must
then use the Kill -9 command on the Solaris 8 host to kill everything as the
bp.kill_all and netbackup stop commands are ineffective. Anyone else seen this
behavior in 5.1 MP5 ???
Steven R. Hudson
Sysadmin - Enterprise
Storage
Iron
Mountain
745 Atlantic
Avenue
Boston MA
02111
Phone: (617)
535-2849
steve.hudson AT ironmountain DOT com
The information contained in this email message and its attachments is
intended only for the private and confidential use of the recipient(s) named
above, unless the sender expressly agrees otherwise. Transmission of email
over the Internet is not a secure communications medium. If you are requesting
or have requested the transmittal of personal data, as defined in applicable
privacy laws by means of email or in an attachment to email you must select a
more secure alternate means of transmittal that supports your obligations to
protect such personal data. If the reader of this message is not he intended
recipient and/or you have received this email in error, you must take no
action based on the information in this email and you are hereby notified that
any dissemination, misuse or coping or disclosure of this communication is
strictly prohibited. If you have received this communication in error, please
notify us immediately by email and delete the original message.
_______________________________________________ Veritas-bu
maillist - Veritas-bu AT mailman.eng.auburn DOT edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
-- Matthew C. Agle, MCSA rascal1981 AT gmail DOT com
Define
Trouble: One hundred users standing up in their cubes asking "what
happened" after you change a setting....
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|