Veritas-bu

[Veritas-bu] hanging or never starting jobs

2006-12-20 11:53:51
Subject: [Veritas-bu] hanging or never starting jobs
From: RDombrowski at dcvast.com (Roger Dombrowski)
Date: Wed, 20 Dec 2006 10:53:51 -0600
In our case the problem got progressively worse over time. A month or
two ago just my laptop backups (that happen at lunchtime) where
"hanging" up. Over time, our weekend fulls, then ultimately we couldn't
go more than a day without the problem. We could stop/start Netbackup
and get things to work for a little while. I spent a great deal of time
chasing what I saw to be "socket errors".

 

What is even more interesting (I think). Our environment was pretty rock
solid until Solaris patches were applied a few months back. I never
could track down what might have been the culprit. I'm just relieved
that this "work around" seems to work for us.

 

Glad its working for you Bobby.

 

Steve, looks like something else is going on in your environment. When
the problem happens for you does the activity monitor show the job as
active but you just don't see any activity to the tape drive?

 

________________________________

From: Steve Fogarty [mailto:steve.fogarty at gmail.com] 
Sent: Wednesday, December 20, 2006 8:38 AM
To: 'Bobby R Windle'
Cc: Roger Dombrowski; veritas-bu at mailman.eng.auburn.edu;
support at datalink.com
Subject: RE: [Veritas-bu] hanging or never starting jobs

 

Were you having the problem daily?  My backups would go for as many as 5
days, before the problem reappeared.

 

Steve

 

________________________________

From: Bobby R Windle [mailto:bwindle at wlgore.com] 
Sent: Wednesday, December 20, 2006 9:50 AM
To: Steve Fogarty
Cc: 'Roger Dombrowski'; veritas-bu at mailman.eng.auburn.edu;
support at datalink.com
Subject: RE: [Veritas-bu] hanging or never starting jobs


Yes... So far as of 34 hours I have not seen one hung job. It looks like
it fixed my problem. On a side note: In the tech notes it shows certain
error 
type messages in the bpsched log and /var/adm/messages. My systems did
not report any errors of the such, however; all the symptons were 
definately there. 

Thank you everyone for your help.. 


Bobby Windle ( Data backup & Recovery )
W.L. Gore & associates, Inc.
bwindle at wlgore.com
cell : (302) 588-7374 (preferred)
office: (302) 292-4026 



"Steve Fogarty" <steve.fogarty at gmail.com> 

12/19/2006 11:40 AM 

To

"'Roger Dombrowski'" <RDombrowski at dcvast.com>, "'Bobby R Windle'"
<bwindle at wlgore.com>, <veritas-bu at mailman.eng.auburn.edu> 

cc

 

Subject

RE: [Veritas-bu] hanging or never starting jobs

 

 

 




We had the same issue with our NB5.1MP5 after upgrading from Sol 9 to
Sol 10.  I have turned off/on tcp_fusion.  I get the same results with
it off/on. 
  
Bobby...let me know if this works for you.  I have an open ticket with
Veritas for this problem, and I would be interested in telling them if
the tcp_fusion does not work for you to. 
  
Thanks 
  
Steve 

________________________________

From: veritas-bu-bounces at mailman.eng.auburn.edu
[mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Roger
Dombrowski
Sent: Tuesday, December 19, 2006 11:18 AM
To: Bobby R Windle; veritas-bu at mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] hanging or never starting jobs

Bobby, 
  
We fought a weird issue like this for a couple of months. Some of our
backups would run and sometime they would just hang in the activity
monitor. We found a doc on Sunsolve that told described making the
following changes... 
  
------------------------------------------------------------------- 
1) 
  
# echo 'do_tcp_fusion/W 0' | mdb -kw 
  
The NetBackup processes will need to be restarted. 
  
----------------------------------------------------- 
2) To make the workaround persistent across the system boot: 
  
Add following line in the /etc/system file. 
  
set ip:do_tcp_fusion = 0 
  
One needs to reboot the system before the workaround will take effect. 
  
  
We were running a fresh install of Solaris 10 with nbu 5.1 MP5. If you
have a sunsolve account, search for do_tcp_fusion, hope this helps... 
  
  

 

________________________________


From: veritas-bu-bounces at mailman.eng.auburn.edu
[mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Bobby R
Windle
Sent: Tuesday, December 19, 2006 8:18 AM
To: veritas-bu at mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] hanging or never starting jobs 
  

I just recently upgraded our media & master servers from Solaris 9 to
Solaris 10. After doing the upgrade and reloading all 
our backup binaries etc etc., I'm have some problems with jobs running.
It shows the jobs starting and actually running but 
I never see any data transferring. In some of the activity details it
shows tape mounting positioning but never starts to write. 
In some of the other logs it has nothing in the job details. I can ping
the clients but seems no data transfers. 

This only happens to 10 - 20 clients. Always not the same ones and it
happens on different platforms. Not sure what to go 
after here. 

I'm running Netbackup 5.1 MP5 on solaris 10 media & master servers. My
environment is around 400 servers. Approximately 
35 Oracle DB's , 20 SQL DB's. Clients are everything from linux,
solaris, netware and of course windows. 

Any ideas would be appreciated.. 

Thanks 

Bobby Windle ( Data backup & Recovery )
W.L. Gore & associates, Inc.
bwindle at wlgore.com
cell : (302) 588-7374 (preferred)
office: (302) 292-4026 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mailman.eng.auburn.edu/pipermail/veritas-bu/attachments/20061220/2e298add/attachment.html