Veritas-bu

[Veritas-bu] hanging or never starting jobs

2006-12-20 12:02:10
Subject: [Veritas-bu] hanging or never starting jobs
From: steve.fogarty at gmail.com (Steve Fogarty)
Date: Wed, 20 Dec 2006 13:02:10 -0400
Yeah the job is active in the Activity Monitor.  We are backing up to disk
(Disk Staging).  We have actually tracked the problem down, and sent it to
the Veritas developers.
 
bpdm is talking to bpbrm trying to terminate the
backup(mpx_terminate_backup). 
bpbrm is still in the middle of the backup, doing a select on all sockets
(waiting for input). 
It doesn't appear to see the message from bpdm (problem 1), and the timeout
on the select 
is an invalid pointer, so it blocks, waiting forever (problem2)
 
So the stream is basically done, but NetBackup will not let in finish.
 
Steve
 

  _____  

From: Roger Dombrowski [mailto:RDombrowski at dcvast.com] 
Sent: Wednesday, December 20, 2006 12:54 PM
To: Steve Fogarty; Bobby R Windle
Cc: veritas-bu at mailman.eng.auburn.edu; support at datalink.com
Subject: RE: [Veritas-bu] hanging or never starting jobs



In our case the problem got progressively worse over time. A month or two
ago just my laptop backups (that happen at lunchtime) where "hanging" up.
Over time, our weekend fulls, then ultimately we couldn't go more than a day
without the problem. We could stop/start Netbackup and get things to work
for a little while. I spent a great deal of time chasing what I saw to be
"socket errors".

 

What is even more interesting (I think). Our environment was pretty rock
solid until Solaris patches were applied a few months back. I never could
track down what might have been the culprit. I'm just relieved that this
"work around" seems to work for us.

 

Glad its working for you Bobby.

 

Steve, looks like something else is going on in your environment. When the
problem happens for you does the activity monitor show the job as active but
you just don't see any activity to the tape drive?

 

  _____  

From: Steve Fogarty [mailto:steve.fogarty at gmail.com] 
Sent: Wednesday, December 20, 2006 8:38 AM
To: 'Bobby R Windle'
Cc: Roger Dombrowski; veritas-bu at mailman.eng.auburn.edu;
support at datalink.com
Subject: RE: [Veritas-bu] hanging or never starting jobs

 

Were you having the problem daily?  My backups would go for as many as 5
days, before the problem reappeared.

 

Steve

 

  _____  

From: Bobby R Windle [mailto:bwindle at wlgore.com] 
Sent: Wednesday, December 20, 2006 9:50 AM
To: Steve Fogarty
Cc: 'Roger Dombrowski'; veritas-bu at mailman.eng.auburn.edu;
support at datalink.com
Subject: RE: [Veritas-bu] hanging or never starting jobs


Yes... So far as of 34 hours I have not seen one hung job. It looks like it
fixed my problem. On a side note: In the tech notes it shows certain error 
type messages in the bpsched log and /var/adm/messages. My systems did not
report any errors of the such, however; all the symptons were 
definately there. 

Thank you everyone for your help.. 


Bobby Windle ( Data backup & Recovery )
W.L. Gore & associates, Inc.
bwindle at wlgore.com
cell : (302) 588-7374 (preferred)
office: (302) 292-4026 




"Steve Fogarty" <steve.fogarty at gmail.com> 

12/19/2006 11:40 AM 


To

"'Roger Dombrowski'" <RDombrowski at dcvast.com>, "'Bobby R Windle'"
<bwindle at wlgore.com>, <veritas-bu at mailman.eng.auburn.edu> 


cc

 


Subject

RE: [Veritas-bu] hanging or never starting jobs

 


 

 




We had the same issue with our NB5.1MP5 after upgrading from Sol 9 to Sol
10.  I have turned off/on tcp_fusion.  I get the same results with it
off/on. 
  
Bobby...let me know if this works for you.  I have an open ticket with
Veritas for this problem, and I would be interested in telling them if the
tcp_fusion does not work for you to. 
  
Thanks 
  
Steve 

  _____  

From: veritas-bu-bounces at mailman.eng.auburn.edu
[mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Roger
Dombrowski
Sent: Tuesday, December 19, 2006 11:18 AM
To: Bobby R Windle; veritas-bu at mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] hanging or never starting jobs

Bobby, 
  
We fought a weird issue like this for a couple of months. Some of our
backups would run and sometime they would just hang in the activity monitor.
We found a doc on Sunsolve that told described making the following changes.

  
------------------------------------------------------------------- 
1) 
  
# echo 'do_tcp_fusion/W 0' | mdb -kw 
  
The NetBackup processes will need to be restarted. 
  
----------------------------------------------------- 
2) To make the workaround persistent across the system boot: 
  
Add following line in the /etc/system file. 
  
set ip:do_tcp_fusion = 0 
  
One needs to reboot the system before the workaround will take effect. 
  
  
We were running a fresh install of Solaris 10 with nbu 5.1 MP5. If you have
a sunsolve account, search for do_tcp_fusion, hope this helps. 
  
  

 

  _____  


From: veritas-bu-bounces at mailman.eng.auburn.edu
[mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Bobby R
Windle
Sent: Tuesday, December 19, 2006 8:18 AM
To: veritas-bu at mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] hanging or never starting jobs 
  

I just recently upgraded our media & master servers from Solaris 9 to
Solaris 10. After doing the upgrade and reloading all 
our backup binaries etc etc., I'm have some problems with jobs running. It
shows the jobs starting and actually running but 
I never see any data transferring. In some of the activity details it shows
tape mounting positioning but never starts to write. 
In some of the other logs it has nothing in the job details. I can ping the
clients but seems no data transfers. 

This only happens to 10 - 20 clients. Always not the same ones and it
happens on different platforms. Not sure what to go 
after here. 

I'm running Netbackup 5.1 MP5 on solaris 10 media & master servers. My
environment is around 400 servers. Approximately 
35 Oracle DB's , 20 SQL DB's. Clients are everything from linux, solaris,
netware and of course windows. 

Any ideas would be appreciated.. 

Thanks 

Bobby Windle ( Data backup & Recovery )
W.L. Gore & associates, Inc.
bwindle at wlgore.com
cell : (302) 588-7374 (preferred)
office: (302) 292-4026 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mailman.eng.auburn.edu/pipermail/veritas-bu/attachments/20061220/60c9e65a/attachment.html

<Prev in Thread] Current Thread [Next in Thread>