If it happens multiple times during the night and the communication between
networker and dd is not going down would mean a client timeout issue, but if
this started after upgrading the ddos I would think networker has a problem
talking with the new ddos. How's the health of the networker server? With
backing up that many clients and handling indexes that's putting a little bit
of a load on it if it's not a beefy box.
I also noticed something in my last post I need to clear up. When I said
"Increasing the client parallelism will cause you to have more streams going to
the dd box, which may slow your backup down" this is not entirely true the way
I worded it. I didn't mean having more streams going to the dd box will slow
your backup down, but having more streams coming out of the client will slow
your backup down. My fingers can't type what my mind is telling it.. :)
-----Original Message-----
From: Stanley R. Horwitz [mailto:stan AT temple DOT edu]
Sent: Tuesday, June 19, 2012 12:58 PM
To: EMC NetWorker discussion; Chester Martin
Subject: Re: [Networker] Server lost connection problem
Hi Chester,
This seems to occur at different times of the day and night. I agree that
increasing client parallelism doesn't make much sense, but perhaps it is one of
those counterintuitive situations. The savegroup parallelism is set to 10 for
each savegroup. Auto media management is not enabled on my DD Boost devices,
but nothing in the logs on the NetWorker server suggess a problem in that
regard. I am going to ask my SAN manager to look at the DD system to try to
ascertain if it is in good health, but the daily health report emails I get
from it do not indicate any sort of a problem.
On 06 19, 2012, at 1:39 PM, Chester Martin wrote:
> Hello,
> At first glance you would think it'd be a timeout issue that adjusting the
> keep alive values would fix. Being that you just updated to a new DDOS you
> may want to see if there are any errors being reported on the data domain
> side. Also, see if there is a certain time when these errors happen.
> Meaning, are there clients that kick off at 5pm and the error happens at 6pm
> and all the client's backup that was running at that time cancel with the
> "connection dropped" error? You didn't mention anything about the data
> domain devices going offline, but if there was an issue with the networker
> server not talking to data domain your devices should go offline. But if you
> have "auto media management" enabled on the dd devices networker will attempt
> to bring them back online.
>
> I would think that increasing the client parallelism would add to the problem
> instead of help it. Increasing the client parallelism will cause you to have
> more streams going to the dd box, which may slow your backup down. If any
> parallelism needs to be adjusted I would do it from the group level and not
> the client level. Is it possible you have your group parallelism set to 0
> and some of the clients could be waiting on resources and timing out?
|