I have been experiencing frequent savegrp hangs and clone job failures
since upgrading from a stable 7.1.3 on Solaris 9 to 7.3.2 in July, 2007.
After much escalation with our support vendor and EMC, we are now at
version 7.3.3 build 510 and have patched versions of savegrp and nsrjobd
binaries installed.
Just before I installed these latest patches, I was searching on Powerlink
for other information and came across solution ID esg91832, dated 11/26/07
and titled "Process hang after upgrading to 7.3.x or higher on Solaris 9 &
10". It describes three TCP settings, two of which were larger than those
currently enabled on our server. Since I enabled the suggested settings
and patches on 12/14, we have not had any hung groups or failed clones. I
am cautiously optimistic that this will either resolve or greatly reduce
our issues. I don't know if these patches are in the recently available
7.3.4 or 7.4.1 versions.
Since we have a lot of EMC attention on our issues right now, I don't know
why they hadn't notified me of the TCP settings. I don't think making the
suggested changes could possibly hurt, though. They have told me that
other customers have similar problems, but many report different details.
I am sure using other platforms and configurations could have different
symptoms, but suggest looking at similar settings in other platforms. I
have always thought our issues were related to architectural changes EMC
made to the interprocess communications NetWorker is using post 7.1.x.
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|