Hey Len, thanks for the response.
Please see http://support.veritas.com/docs/274544
We have 1 HP-UX Master/Media Server and 5 other Media Servers running NBU 5.1
MP2 and IBM 3494 Tape Libraries cross campus using In Line Tape Copy. We also
use NetApp Filers for D2D disk backups and use Vault to dupe these backups to
tape. We use IBM 3590 and 3592 tape drives along with NDMP backups to the
NetApps - a total of around 40 tape drives. We send primary backups cross
campus for immediate vaulting and the secondary tape gets vaulted greater than
90 miles away. This gives us the local data, a cross campus tape backup, and a
regionally vaulted tape.
Under 4.5 FP7 we would have anywhere from 1,000-3,000 jobs either queued or
active and staggerred throughout the weekend and could go to sleep on Friday
and wake up on Sunday and do a few reruns of failed backups. In-Line Tape copy
creates 3 jobs, a parent and the 2 tape jobs going to each campus. We have
been testing our max jobs and it appears to be around 400 total queued and
active jobs when everything either gets hung to the point of reboot under 5.1
... or the job end-writing time may have a 1-5 hour difference between when the
job actually posts as complete and releases resources. When NBU finishes the
backup but doesn't post the job as complete and hangs onto resources is when
daily backups get back-logged.
This past week we have had to bounce NBU because 1,000 jobs are queued, 400 are
active, of that 400 the majority are actually done, but no new jobs can start.
Our backup window for night time backups closes at 6 AM. There are daytime
backups that are then supposed to start. The only way to do this is to crash
all 6 NBU instances, let the 1,000 jobs fail with a status 50, and wait about
20-60 minutes for BPSCHED to get its' head on straight, and to get going again.
It is a viscious cycle, because once the daytime backups are going, we try to
resubmit 1,000 of these failed backups and can't get this done each day. The
schedules then have a 12 hour delay, so if I resubmit them late in the day,
they won't run again for another 12 hours even though the window is open, and
it is eternal damnation.
Compound that with the fact that we have
/opt/openv/netbackup/bin/admincmds/bpconfig -tries 2 and as soon as NBU is
recycled, it resubmits thousands of jobs and buries BPSCHED again requiring
another recycle.
We do get to a frustration level where we set bpconfig -tries 0 and then
manually submit jobs all night long and all weekend long. Thus, go back to my
link where Veritas suggests baby-sitting backups and not submitting too many at
a time as an Enterprise Level solution.
I hope this answers your questions, I hope my frustration doesn't deter anyone
from asking me more questions or providing suggestions. I really do need your
input and ideas. I would much prefer your critical and scrutinizing questions
vs. having to tell my wife why the phone rings all night long.
Thanks to all !!!
Brian
-----Original Message-----
From: Len Boyle [mailto:Len.Boyle AT sas DOT com]
Sent: Saturday, January 29, 2005 7:46 PM
To: DIVEN, BRIAN; veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large
Environments
Hello Brian
Two questions, What is the ballpark range for submitting many backups? I do not
believe we have seen your problem with 5.1, but then maybe we do not meet the
magic number. Or it may depend on the servers used to support the backup
server....
Also I searched on support.veritas.com and I could not find anything using the
search pattern of 274544. Is there a typo, or did veritas remove the technote?
len
________________________________
From: veritas-bu-admin AT mailman.eng.auburn DOT edu on behalf of briandiven AT
northwesternmutual DOT com
Sent: Sat 1/29/2005 7:20 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large Environments
TechNote 274544 provides ideas to reduce the burden on the NBU 5.1 software in
large environments. Since our upgrade 9 weeks ago, we have been bouncing NBU
almost daily due to hung backups and we are a large 24x7 environment and have
seen limited improvements after 9 weeks of an open case with Veritas - and then
seeing this TechNote. I find it ironic that the re-branding of 5.1 to
Enterprise Server and a technote that says not to stress BPSCHED in Enterprise
Server environments can occur, so I'd like to see if I'm alone here.
We have had several issues upgrading to NBU 5.1 MP1 and now MP2 where we are
unable to submit many backups (queued or active) at once. The recent TechNote
274544 fits our account perfectly and I am wondering if any other large NBU
shops are experiencing similar issues. I have a hard time believing this
TechNote was generated just because of us. Veritas shows no desire to address
this other than to wait until release 6.x and I could use some friends that
will either state that they have an issue or help me push a fix through.
Veritas backline also stated that they won't support us backing off of 5.1 MP1
to 4.5 FP7 where we had a stable environment. They test MP2 to MP1 uninstalls,
but when you upgrade from release 4 to release 5, they don't test this and
there are some inherent undocumented catalog changes that could mess us up and
not be able to recover a 5.1 backup to a 4.5 restore. They only want to fix
and go forward. We had many MP2 binaries prior to them being released and then
moved to MP2 and we still can't get through a night if we submit all of our
backups.
We have exercised every recommendation in this technote and remain unsuccessful.
I need some of my friends to contact me with similar issues to get this fixed
if we are to fix and go forward. We need to push Veritas on this issue as a
group of large Enterprise Server companies.
Brian
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|