Veritas-bu

[Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large Environments

2005-01-29 21:53:47
Subject: [Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large Environments
From: briandiven AT northwesternmutual DOT com (briandiven AT northwesternmutual DOT com)
Date: Sat, 29 Jan 2005 20:53:47 -0600
Hey Len, thanks for the response.

Please see http://support.veritas.com/docs/274544

We have 1 HP-UX Master/Media Server and 5 other Media Servers running NBU 5.1 
MP2 and IBM 3494 Tape Libraries cross campus using In Line Tape Copy.  We also 
use NetApp Filers for D2D disk backups and use Vault to dupe these backups to 
tape.  We use IBM 3590 and 3592 tape drives along with NDMP backups to the 
NetApps - a total of around 40 tape drives.  We send primary backups cross 
campus for immediate vaulting and the secondary tape gets vaulted greater than 
90 miles away.  This gives us the local data, a cross campus tape backup, and a 
regionally vaulted tape.

Under 4.5 FP7 we would have anywhere from 1,000-3,000 jobs either queued or 
active and staggerred throughout the weekend and could go to sleep on Friday 
and wake up on Sunday and do a few reruns of failed backups.  In-Line Tape copy 
creates 3 jobs, a parent and the 2 tape jobs going to each campus.  We have 
been testing our max jobs and it appears to be around 400 total queued and 
active jobs when everything either gets hung to the point of reboot under 5.1 
... or the job end-writing time may have a 1-5 hour difference between when the 
job actually posts as complete and releases resources.  When NBU finishes the 
backup but doesn't post the job as complete and hangs onto resources is when 
daily backups get back-logged.

This past week we have had to bounce NBU because 1,000 jobs are queued, 400 are 
active, of that 400 the majority are actually done, but no new jobs can start.  
Our backup window for night time backups closes at 6 AM.  There are daytime 
backups that are then supposed to start.  The only way to do this is to crash 
all 6 NBU instances, let the 1,000 jobs fail with a status 50, and wait about 
20-60 minutes for BPSCHED to get its' head on straight, and to get going again.

It is a viscious cycle, because once the daytime backups are going, we try to 
resubmit 1,000 of these failed backups and can't get this done each day.  The 
schedules then have a 12 hour delay, so if I resubmit them late in the day, 
they won't run again for another 12 hours even though the window is open, and 
it is eternal damnation.

Compound that with the fact that we have 
/opt/openv/netbackup/bin/admincmds/bpconfig -tries 2 and as soon as NBU is 
recycled, it resubmits thousands of jobs and buries BPSCHED again requiring 
another recycle.

We do get to a frustration level where we set bpconfig -tries 0 and then 
manually submit jobs all night long and all weekend long.  Thus, go back to my 
link where Veritas suggests baby-sitting backups and not submitting too many at 
a time as an Enterprise Level solution.

I hope this answers your questions, I hope my frustration doesn't deter anyone 
from asking me more questions or providing suggestions.  I really do need your 
input and ideas.  I would much prefer your critical and scrutinizing questions 
vs. having to tell my wife why the phone rings all night long.

Thanks to all !!!

Brian

-----Original Message-----
From: Len Boyle [mailto:Len.Boyle AT sas DOT com]
Sent: Saturday, January 29, 2005 7:46 PM
To: DIVEN, BRIAN; veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large
Environments


Hello Brian 
 
Two questions, What is the ballpark range for submitting many backups? I do not 
believe we have seen your problem with 5.1, but then maybe we do not meet the 
magic number. Or it may depend on the servers used to support the backup 
server....
 
Also I searched on support.veritas.com and I could not find anything using the 
search pattern of 274544. Is there a typo, or did veritas remove the technote?
 
len

________________________________

From: veritas-bu-admin AT mailman.eng.auburn DOT edu on behalf of briandiven AT 
northwesternmutual DOT com
Sent: Sat 1/29/2005 7:20 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large Environments



TechNote 274544 provides ideas to reduce the burden on the NBU 5.1 software in 
large environments.  Since our upgrade 9 weeks ago, we have been bouncing NBU 
almost daily due to hung backups and we are a large 24x7 environment and have 
seen limited improvements after 9 weeks of an open case with Veritas - and then 
seeing this TechNote.  I find it ironic that the re-branding of 5.1 to 
Enterprise Server and a technote that says not to stress BPSCHED in Enterprise 
Server environments can occur, so I'd like to see if I'm alone here.

We have had several issues upgrading to NBU 5.1 MP1 and now MP2 where we are 
unable to submit many backups (queued or active) at once.  The recent TechNote 
274544 fits our account perfectly and I am wondering if any other large NBU 
shops are experiencing similar issues.  I have a hard time believing this 
TechNote was generated just because of us.  Veritas shows no desire to address 
this other than to wait until release 6.x and I could use some friends that 
will either state that they have an issue or help me push a fix through.

Veritas backline also stated that they won't support us backing off of 5.1 MP1 
to 4.5 FP7 where we had a stable environment.  They test MP2 to MP1 uninstalls, 
but when you upgrade from release 4 to release 5, they don't test this and 
there are some inherent undocumented catalog changes that could mess us up and 
not be able to recover a 5.1 backup to a 4.5 restore.  They only want to fix 
and go forward.  We had many MP2 binaries prior to them being released and then 
moved to MP2 and we still can't get through a night if we submit all of our 
backups.

We have exercised every recommendation in this technote and remain unsuccessful.

I need some of my friends to contact me with similar issues to get this fixed 
if we are to fix and go forward.  We need to push Veritas on this issue as a 
group of large Enterprise Server companies.

Brian

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu




<Prev in Thread] Current Thread [Next in Thread>