Networker

Re: [Networker] Is NetWorker really compatible with RHEL 6.2?

2012-02-21 14:53:56
Subject: Re: [Networker] Is NetWorker really compatible with RHEL 6.2?
From: Tim Mooney <Tim.Mooney AT NDSU DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 21 Feb 2012 13:53:06 -0600
In regard to: Re: [Networker] Is NetWorker really compatible with RHEL...:

On 2/21/12 1:44 PM, Tim Mooney wrote:
Both of our NetWorker servers are running RHEL 6.2.  Both are 7.6.2.5,
which has been pretty stable for us.  We're current running the same
kernel you are.

I haven't seen the issue you're seeing.  I don't have abrt installed on
either server, though.  However, there have been no cores from nsrd saved
in /nsr/cores.  The only cores I've seen from any of the daemons are from
nsrlcpd.

Is this a 64 bit system and are those 64 bit binaries?

I am getting one to three cores files out of nsrd and frequently they are 
accompanied by things
of the ilk of:

*** glibc detected *** /usr/sbin/nsrd: munmap_chunk(): invalid pointer: 
0x0000000001b5d1d0 ***
*** glibc detected *** /usr/sbin/nsrd: free(): invalid pointer: 
0x0000000001b5d1d0 ***

in /nsr/logs/daemon.raw

There isn't a single instance of that in daemon.raw on either of our
NetWorker servers.

We did a very minimal install of RHEL 6.1 (now updated to 6.2).  Other
than package differences between our servers and yours, the first thing I
can think of to look at would be the tuning settings that EMC requires in
their "Technical Guidance" document.

Our puppet rules for the sysctl settings currently contain:

---8<---

  # These first two settings, for TCP backlog, come from both the
  # NetWorker Installation Guide (for 7.6.2) and the "Technical
  # Guidance for Upgrades # to EMC NetWorker 7.4 Service Pack 2",
  # page 14.
  sysctl::set{'net.ipv4.tcp_max_syn_backlog' : value => '8192' }
  sysctl::set{'net.core.netdev_max_backlog'  : value => '8192' }

  #
  # These rest of these tuning suggestions come from the "Technical
  # Guidance"
  # document.
  #
  sysctl::set{'net.core.rmem_max'   : value => '16777216' }
  sysctl::set{'net.core.wmem_max'   : value => '16777216' }

  #
  # The 7.6 Technical Guidance doc has tcp_wmem lowered back to what
  # was suggested in the 7.4 guide.
  #
  sysctl::set{'net.ipv4.tcp_rmem'   : value => '4096 87380 16777216' }
  sysctl::set{'net.ipv4.tcp_wmem'   : value => '4096 65536 16777216' }

  #
  # These are from the "Technical Guidance" doc
  #
  sysctl::set{'net.ipv4.tcp_keepalive_intvl'  :   value => '30' }

  # The 7.6 Technical Guidance doc now recommends 10 probes, rather than
  # 8.
  sysctl::set{'net.ipv4.tcp_keepalive_probes' :   value => '10' }

  # The 7.6 Technical Guidance doc lowered keepalive time from 7200 to
  # 3420
  sysctl::set{'net.ipv4.tcp_keepalive_time' :   value => '3420' }

  # timeout after improper close, as recommended by the "Technical
  # guidance" doc
  sysctl::set{'net.ipv4.tcp_fin_timeout'    :   value => '60' }

  #
  # FIXME: the Technical Guidance doc recommends DISabling TCP offloading
  # on the NIC(s), which I'm skeptical about.  Do some research and
  # see how advisable this truly is.
  #

---8<---

Beyond those settings, there's also a tweak to max open files that's
required by 7.6.

One would think that not having these set (or having them set too low)
would result in more obvious error messages about the problem, but
with a "mature" codebase like NetWorker, it's certainly possible that
memory handling issues could result.

There are lots of other potential things to look at to try narrow down
why you would be seeing this while we haven't, but I would start there.

I wish I could convince EMC to actually look at the plethora of core
files I have provided to them on my Sev 1 issue, but it's now been 2.5
days since I opened the issue and EMC is still arguing with me about
whether the dumps are worth looking at or not -- I'm not even sure that
I'm going to have a backup system left when this is done.

I can relate.  I've had a couple very good support experiences with EMC
in the past year (kudos to Wallace Lee!), but I also had one support
experience that was the worst vendor support experience I've had in my
entire career.  I'm always willing to jump through the hoops and the "rule
outs" that their support script forces one to go through, I just wish that
they were willing to follow suit and try the things I suggest.

Please let the list know what the issue turns out to be.  It will very
likely be useful to others.

Good luck!

Tim
--
Tim Mooney                                             Tim.Mooney AT ndsu DOT 
edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER