Veritas-bu

[Veritas-bu] Resolution to Jobs Hanging or taking a long time to disengage since upgrade to 5.x

2005-03-18 19:32:33
Subject: [Veritas-bu] Resolution to Jobs Hanging or taking a long time to disengage since upgrade to 5.x
From: briandiven AT northwesternmutual DOT com (briandiven AT northwesternmutual DOT com)
Date: Fri, 18 Mar 2005 18:32:33 -0600
I was reminded about the following email and want to thank people for their 
help and explain the resolution.  The following is a reminder of the issue, the 
fun stuff is after that.

> > Date: Sat, 29 Jan 2005 18:20:09 -0600
> > 
> > TechNote 274544 provides ideas to reduce the burden on the NBU 5.1 software 
> > in large environments.  Since our upgrade 9 weeks ago, we have been 
> > bouncing NBU almost daily due to hung backups and we are a large 24x7 
> > environment and have seen limited improvements after 9 weeks of an open 
> > case with Veritas - and then seeing this TechNote.  I find it ironic that 
> > the re-branding of 5.1 to Enterprise Server and a technote that says not to 
> > stress BPSCHED in Enterprise Server environments can occur, so I'd like to 
> > see if I'm alone here.
> > 
> > We have had several issues upgrading to NBU 5.1 MP1 and now MP2 where we 
> > are unable to submit many backups (queued or active) at once.  The recent 
> > TechNote 274544 fits our account perfectly and I am wondering if any other 
> > large NBU shops are experiencing similar issues.  I have a hard time 
> > believing this TechNote was generated just because of us.  Veritas shows no 
> > desire to address this other than to wait until release 6.x and I could use 
> > some friends that will either state that they have an issue or help me push 
> > a fix through.

OK - New Date: Today

We have a site specific BPSCHED binary that will be made available in 5.1 MP3 
coming to a theatre near you soon.  This should resolve many problems in the 
accounts that I have been in contact with.  Although BPSCHED hasn't changed 
much from 4.5 to 5.1, it changed enough.  Here is my understanding of what has 
changed which was related to our issues.

NBU 4.5 did Version Checking for the purposes of In-Line Tape Copy (ITC).  
Because 4.5 was backwards compatible, they checked to see if there was a 3.4 
client which didn't support ITC.  It also appears that 5.1 handled directives 
as to what was to be backed up differently.  So, when you use a directive of 
"All Local Hard Drives", NBU did the analysis of what this meant and 
interrogated each server sequentially to resolve this before the job would even 
appear as a queued job in the Activity Monitor.  In other words, there were a 
lot of background security checks utilizing resources that you can't see.  If 
you have any clients with Network Connectivity issues, BPSCHED will wait for 
the time out value before querying the next server.  Meanwhile, you may hit the 
next backup window and continue to backlog BPSCHED with resolution issues that 
are transparent to you.

The fix was that NBU 5.1 doesn't need to do this version checking from 5.1 back 
to 4.5 for ITC.  When they took this out of BPSCHED, we have been able to push 
the system with over 2,000 concurrent backup jobs and nothing is getting 
delayed.  This seems to be extremely susceptible for Windows Servers.

Because of this, we are going to go back to "All Local Drives" this weekend 
along with ITC.  I have run just over a week without issues with ITC turned 
back on.  I am anxious to see how the new directive of "All Local Drives" 
works.  Per my reference to the technote, we have also tried to "not stress" 
NBU and broke our policies into multiple policies that will have a convenient 
schedule.  If this new directive works, my next step will be to get back to 
life as normal and submit everything at once and let NBU determine when 
resources are available, run my backups, and it will really be good to wake up 
and see my backups are done.

So far, this has been a wonderful fix that 2 Veritas back-end Engineers have 
been involved in on daily calls for 15 weeks.  I must say that Veritas really 
gave us the resources to resolve this, but we will have a post-mortem as to 
what took so long to get their attention.  Coming together on this site and 
finding friends helped a lot ... this is a useful tool for communication and 
resolution.  So, I send my thanks to many people.  

We do have a special binary for BPSCHED and they want us to upgrade to MP3 
soon, but that does include other improvements.  If it ain't broke, don't fix 
it.  I am very happy where I am.  I am gunshy to even apply MP3, 15 weeks was a 
long burden for many of us support people.

I wrote this email in hopes that it will help some people that are now 7 weeks 
without a life, my friends, and people considering upgrading.  My heart feels 
that 5.1 MP3 will be solid.

I wish that everyone will find that release that keeps them solid and working 
and with family.

Brian


<Prev in Thread] Current Thread [Next in Thread>
  • [Veritas-bu] Resolution to Jobs Hanging or taking a long time to disengage since upgrade to 5.x, briandiven AT northwesternmutual DOT com <=