Networker

Re: [Networker] glibc on RHEL4 x86-64 and nsrexecd core dump

2008-02-11 19:00:59
Subject: Re: [Networker] glibc on RHEL4 x86-64 and nsrexecd core dump
From: Preston de Guise <enterprise.backup AT GMAIL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 12 Feb 2008 10:56:47 +1100
On 12/02/2008, at 10:28 AM, Preston de Guise wrote:

Hi Patti,

I've looked through the archives and found a couple of references,
mostly to the double free message below.  Suggestions have included
disabling nsrauth for a client and modifying environment variable
MALLOC_CHECK_ to prevent nsrexecd from being killed immediately (both
from 2 years ago).  I'm having an issue where I am receiving one of
these messages at different nsrexecd core dumps.

*** glibc detected *** double free or corruption (fasttop):
0x0000002a987ffc30 ***
*** glibc detected *** corrupted double-linked list: 0x00000037e6c316b8
***

I am running v7.3.3 Networker - 64-bit on RHEL4 ES Update 6 x86_64. If I restart Networker and the group in question everything is fine for a
while which can be a day, a few days, a week, ... and then it happens
again. I have a separate smaller system running the 32-bit OS and
Networker and it does not have this issue.

Are you running staging/disk backup units?

I had a customer with this problem and it turned out their NetWorker server was occasionally trying backups to disk to the read-only "portion" of the adv_file devices. When nsrclone/nsrstage/recover/ etc would go to read said savesets, it would cause nsrmmd to crash/ respawn, which would cause the error you're citing above. If the steps were taken to set MALLOC to just warn, rather than crash, eventually nsrexecd would consume too much shared memory and the server would need NetWorker restarted, or worst case, rebooted.


I forgot to add - it's relatively easy to check for this; just do a directory listing of any disk backup units, and if you have any long- ssid named files appearing under the _AF_readonly subdirectory, then you've got the bug^H^H^Hfeature that requires the fix I outlined in my previous email.

Cheers,

Preston.

--
Preston de Guise


"Enterprise Systems Backup and Recovery: A Corporate Insurance Policy", due out August 1 2008:

http://www.crcpress.com/shopping_cart/products/product_detail.asp?sku=AU6396&isbn=9781420076394&parent_id=&pc=

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>