Networker

Re: [Networker] 7.[3|4] savegroup hangs [was: Re: 7.4.1 on solaris with zfs

2007-12-21 11:50:03
Subject: Re: [Networker] 7.[3|4] savegroup hangs [was: Re: 7.4.1 on solaris with zfs
From: Ron Benton <ron.benton AT EMERSONPROCESS DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 21 Dec 2007 11:29:21 -0500
Sorry, I assumed everyone would have access to Powerlink, so didn't 
include the settings. I rarely am able to find what I'm looking for on 
Powerlink, but I thought with the solution ID it would be easy to look up.

As tkimbal mentioned the suggested setting should look like:
# ndd /dev/tcp tcp_conn_req_max_q
1024
# ndd /dev/tcp tcp_conn_req_max_q0                                   
4096
# ndd /dev/tcp tcp_time_wait_interval
60000

Referring to other questions in this thread, we have Indexes backing up at 
the end of most of our groups. When I have a group that is hung, it is 
always showing 100% complete in the NMC GUI. I normally stop the group at 
that point; it normally waits for the 30 minute timeout period to expire 
before it stops. If you click on Stop twice, the groups stop immediately. 
The Group Details window usually shows all clients succeeded, but when I 
Restart the group, it usually runs several index backups but also 
frequently runs a few clients and save sets; they usually succeed in just 
a few minutes.

Early on in our troubleshooting, EMC had us change several parallelism 
changes, both server parallelism and the server's client parallelism. None 
of it seemed to help. I have also reduced the size of the groups that were 
hanging, which I felt would probably reduce the likelihood of hangs but 
didn't seem to make much difference. We are running the Network version of 
NetWorker, so are apparently limited to 30 concurrent sessions running at 
a time by our license. I'm not clear on that, though; maybe it is 30 
concurrent clients or something similar. Anyway, I reduced the size of the 
groups to less than 30 clients each. The failures still occurred randomly 
in eight main groups, both Windows and Sun.

I also get a different less-frequent symptom of hanging on an HP Oracle 
group, where that one group hangs at 100% but doesn't have a corresponding 
savegrp process still running, as all the others do. With no process, 
there is nothing to stop or kill. The only way to get rid of it is to 
restart NetWorker services. However, it doesn't need to be stopped, 
because the group will just run the next time it is scheduled, because it 
doesn't think the group is still running, like it would for the others 
that always still have a savegrp process. It just makes you wonder if it 
really backed everything up.

Hope this helps some of you. We are still hang-free after one week.

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>