Networker

Re: [Networker] Server lost connection problem

2012-06-19 14:28:22
Subject: Re: [Networker] Server lost connection problem
From: Chester Martin <cmartin AT SPP DOT ORG>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 19 Jun 2012 13:22:26 -0500
If it happens multiple times during the night and the communication between 
networker and dd is not going down would mean a client timeout issue, but if 
this started after upgrading the ddos I would think networker has a problem 
talking with the new ddos.  How's the health of the networker server?  With 
backing up that many clients and handling indexes that's putting a little bit 
of a load on it if it's not a beefy box. 

I also noticed something in my last post I need to clear up.  When I said 
"Increasing the client parallelism will cause you to have more streams going to 
the dd box, which may slow your backup down"  this is not entirely true the way 
I worded it. I didn't mean having more streams going to the dd box will slow 
your backup down, but having more streams coming out of the client will slow 
your backup down.  My fingers can't type what my mind is telling it.. :)

-----Original Message-----
From: Stanley R. Horwitz [mailto:stan AT temple DOT edu] 
Sent: Tuesday, June 19, 2012 12:58 PM
To: EMC NetWorker discussion; Chester Martin
Subject: Re: [Networker] Server lost connection problem

Hi Chester,

This seems to occur at different times of the day and night. I agree that 
increasing client parallelism doesn't make much sense, but perhaps it is one of 
those counterintuitive situations. The savegroup parallelism is set  to 10 for 
each savegroup. Auto media management is not enabled on my DD Boost devices, 
but nothing in the logs on the NetWorker server suggess a problem in that 
regard. I am going to ask my SAN manager to look at the DD system to try to 
ascertain if it is in good health, but the daily health report emails I get 
from it do not indicate any sort of a problem.

On 06 19, 2012, at 1:39 PM, Chester Martin wrote:

> Hello,
> At first glance you would think it'd be a timeout issue that adjusting the 
> keep alive values would fix.  Being that you just updated to a new DDOS you 
> may want to see if there are any errors being reported on the data domain 
> side.  Also, see if there is a certain time when these errors happen.  
> Meaning, are there clients that kick off at 5pm and the error happens at 6pm 
> and all the client's backup that was running at that time cancel with the 
> "connection dropped" error?  You didn't mention anything about the data 
> domain devices going offline, but if there was an issue with the networker 
> server not talking to data domain your devices should go offline.  But if you 
> have "auto media management" enabled on the dd devices networker will attempt 
> to bring them back online.
> 
> I would think that increasing the client parallelism would add to the problem 
> instead of help it.  Increasing the client parallelism will cause you to have 
> more streams going to the dd box, which may slow your backup down.  If any 
> parallelism needs to be adjusted I would do it from the group level and not 
> the client level.  Is it possible you have your group parallelism set to 0 
> and some of the clients could be waiting on resources and timing out? 

<Prev in Thread] Current Thread [Next in Thread>