Veritas-bu

Re: [Veritas-bu] Status Code 150's

2008-04-03 13:53:47
Subject: Re: [Veritas-bu] Status Code 150's
From: "Staub, Doug" <rstaub AT amgen DOT com>
To: rascal <rascal1981 AT gmail DOT com>, "mikemclain AT northwesternmutual DOT com" <mikemclain AT northwesternmutual DOT com>
Date: Thu, 3 Apr 2008 10:23:27 -0700

Ah, this was my hell as I lived this for several months with weekly and (sometimes more frequently) restarts, but we have not yet upgraded, as we found ways to optimize our backups around the scheduler.  FWIW, we are on Solaris 9 64-bit (8 GB RAM) and at 5.1 MP5…we also were instructed to upgrade to 5.1MP6 and the issue did not resolve itself.  This is an inherent design flaw of the scheduler in 5.1, which is fixed in 6.0, however, there are things you can do to alleviate/lessen/eliminate the frequency of the restarts:

 

-          Increase the shared memory (in your case, you are at the maximum HP allows)

-          Combine as many streams as possible (do you back up a lot of CIFS/NFS streams that could be consolidated into NDMP) – this greatly helped us out as it decreased the number of streams/jobs being started, so there were fewer to keep track of from the scheduler’s perspective

-          Make sure all buffer settings are at the maximum-supported-by-Symantec (I had 3-4 calls with support before I drug this information out of them) since we have Solaris 9 64-bit with 8 GB of RAM, here are the settings we were advised by Symantec for the number of expected concurrent job starts (on Solaris, monitor the ipcs –a output, not sure of the HP equiv):

400 JOBS:
set msgsys:msginfo_msgmnb=131072
set shmsys:shminfo_shmmax=33554432
set msgsys:msginfo_msgmni=512
set msgsys:msginfo_msgtql=1000
600 JOBS:
set msgsys:msginfo_msgmnb=262144
set shmsys:shminfo_shmmax=67108864
set msgsys:msginfo_msgmni=768
set msgsys:msginfo_msgtql=1500
800 JOBS:
set msgsys:msginfo_msgmnb=262144
set shmsys:shminfo_shmmax=67108864
set msgsys:msginfo_msgmni=1024
set msgsys:msginfo_msgtql=2000
set semsys:seminfo_semmni=2056
set semsys:seminfo_semmns=2056
set semsys:seminfo_semmnu=2056
set semsys:seminfo_semmsl=600
set msgsys:msginfo_msgmni=1024
set msgsys:msginfo_msgtql=2000
1600 JOBS (CURRENT SETTINGS):
* Message queues
set msgsys:msginfo_msgmax=8192
set msgsys:msginfo_msgmnb=524288 
set msgsys:msginfo_msgmni=2048 
set msgsys:msginfo_msgtql=2000 
* Semaphores
set semsys:seminfo_semmni=4096 
set semsys:seminfo_semmns=4096 
set semsys:seminfo_semmnu=4096 
set semsys:seminfo_semmsl=600 
set semsys:seminfo_semopm=64 
set semsys:seminfo_semume=128 
* Shared memory
set shmsys:shminfo_shmmax=4294967296
set shmsys:shminfo_shmmni=230

 

Our issue was compounded by a setting in the bp.conf file with CLIENT_CONNECT_TIMEOUT value of 3600….when you have clients defined in a backup policy that “suddenly” are down/retired/should have been removed from backup policies, this setting will greatly impact the scheduler’s ability to process jobs…its hell when you have a few dozen wreaking havoc.

 

Anyway, with the above adjustments, we have to restart maybe once per month, but do so anyway for other reasons.  We are still hoping to upgrade to 6.5 soon.

 

Regards,

Doug

 


From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of rascal
Sent: Thursday, April 03, 2008 9:13 AM
To: mikemclain AT northwesternmutual DOT com
Cc: randy.k.zimmer AT monsanto DOT com; veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Status Code 150's

 

We had a similar issue with NBU 5.1 MP5.  We rolled to 6 in an attempt to fix the problem and it only made it worse.  We ended up rolling back to 5 and setting up a call with Symantec.  At the end of the day, it was a memory leak which they issued a fix for.  I would suggest trending the memory, recording the results and getting a call opened with Symantec.  We have not experienced this issue since we got the fix for Symantec as an fyi!

On 4/3/08, mikemclain AT northwesternmutual DOT com <mikemclain AT northwesternmutual DOT com> wrote:

Randy,

 

When running on NBU 5.1 MP6, we would experience this error about every 10 days and we had 16GB memory on the master, but on HP-UX 11.11 you are limited to 1.75GB of shared memory on 32-bit apps.      This technote describes the issue/memory leak (http://seer.entsupport.symantec.com/docs/294251.htm), but our only recourse was to recycle NBU once a week.

 

We upgraded to NBU 6.0 last fall and this issue doesn't occur due to the elimination of bpsched.  

 

 

Mike

 

 


From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of ZIMMER, RANDY K [AG/1000]
Sent: Thursday, April 03, 2008 10:10 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Status Code 150's

 

All,

I have a Master Server which is a RP2470 with 1024KB of memory and we process about 1500 backups per day through it.  In the past two weeks I have experienced Code 150's, but the backups were not cancelled by an administrator, but by the system.  Here is the error we receive when it occurs:

3688660: 05:05:29.648 [10196] <16> start_backup_job: fork error: Not enough space (12)

3688661: 05:05:29.648 [10196] <16> run_any_ret_level: failure starting backup job, PID=-1

When this happens nothing else will schedule unti we either restart all the NB process or reboot the server, and I have done both.  We logged a call on this and there is no fix for this as of yet but there is one planned in 5.1MP7 which is due out sometime this month.  The recommendation was to increase the memory on the server (I realize 1GB is extremely low), and we should be receiving it shortly.  I have load balanced the schedule as much as I can.  Has anyone else experienced this issue and if so do you have any information that would be helpful?  The first two times this happened I rebooted the server and the subsequent outages all I did was recycle the application.  I'm looking for any and all opinions on this topic.

Thanks,

Randy K. Zimmer
Sr. Unix System Administrator
Office: 314-694-3109
Cell:  314-960-0500
rkzimm AT monsanto DOT com

This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment.

This e-mail and any attachments may contain confidential information of Northwestern Mutual. If you are not the intended recipient of this message, be aware that any disclosure, copying, distribution or use of this e-mail and any attachments is prohibited. If you have received this e-mail in error, please notify Northwestern Mutual immediately by returning it to the sender and delete all copies from your system. Please be advised that communications received via the Northwestern Mutual Secure Message Center are secure. Communications that are not received via the Northwestern Mutual Secure Message Center may not be secure and could be observed by a third party. Thank you for your cooperation.

 


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu




--
Matthew MCP, MCSA, MCTS, OCA
rascal1981 AT gmail DOT com

Define Trouble:  
Why did you apply THAT patch??....

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>