Hello,
At first glance you would think it'd be a timeout issue that adjusting the keep
alive values would fix. Being that you just updated to a new DDOS you may want
to see if there are any errors being reported on the data domain side. Also,
see if there is a certain time when these errors happen. Meaning, are there
clients that kick off at 5pm and the error happens at 6pm and all the client's
backup that was running at that time cancel with the "connection dropped"
error? You didn't mention anything about the data domain devices going
offline, but if there was an issue with the networker server not talking to
data domain your devices should go offline. But if you have "auto media
management" enabled on the dd devices networker will attempt to bring them back
online.
I would think that increasing the client parallelism would add to the problem
instead of help it. Increasing the client parallelism will cause you to have
more streams going to the dd box, which may slow your backup down. If any
parallelism needs to be adjusted I would do it from the group level and not the
client level. Is it possible you have your group parallelism set to 0 and some
of the clients could be waiting on resources and timing out?
-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Stanley R. Horwitz
Sent: Tuesday, June 19, 2012 10:11 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Server lost connection problem
About two weeks ago, I upgraded my NetWorker server from 7.6.1 to NetWorker
7.6.3.4.Build.879. This server backs up 336 clients (mostly Windows and Linux).
All of the clients back up to a Data Domain system using Boost and a few are
cloned nightly to LTO-5 tape. I have 11 Boost devices configured for direct use
on the server and each Boost device has its max sessions set to a value of 10.
No storage nodes are involved in this data zone.
After we upgraded our DD system to the latest OS, the backups of larger servers
improved in their throughput, but for the past few days, I am noticing an
unusual number of backup failures for several groups of, both Linux and
Windows, including some that also have NetWorker 7.6.3 on them. The error is
always the same in the savegroup report "connection dropped."
There does not appear to be anything problems going on with network
connectivity and in most cases, these clients do not back up via a firewall. I
do not see any errors on the clients or the NetWorker server when I use
"netstat -i." Incremental backups of the same clients also work without issue.
I reviewed the NetWorker tuning guide on PowerLink, but I haven't done any of
the tests they recommended yet with uasm, although it did contain a
recommendation to increase client parallelism to 12, which I changed a few
minutes ago. Most of the clients had the default setting of 4 for their
parallelism.
If anyone has any ideas on how to investigate this problem, please let me know.
I am skeptical that doing tests with uasm will bring forth any enlightenment on
this issue.
|