Networker

Re: [Networker] NSR 75SP3 : Stable for prod ?

2010-08-30 09:11:04
Subject: Re: [Networker] NSR 75SP3 : Stable for prod ?
From: "STANLEY R. HORWITZ" <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 30 Aug 2010 09:09:02 -0400
Frank,

Considering the many NetWorker server configurations out in the field, 
expecting EMC's software testing to uncover every possible issue on every 
platform with every OS version and every hardware driver is unrealistic. I 
suspect that EMC does is test the most common configurations in their software 
certification efforts, which gives rise to the potential for errors to occur on 
less common platforms. 

I too am one of the people who is having trouble with 7.5SP3 (on a Red Hat 
Linux AS 32-bit) running on a Dell 2950 with a Qualstar tape library that has 
four LTO-3 tape drives in it. I am running build 533 and the new savegrp 
binary, but I still find one core dump file per day in /nsr/cores/savegrp and 
my daily cron that runs "savegrp -O" continues to generate output like this …

4690:savegrp: puss-index-backup waiting for 113 job(s) to complete
4690:savegrp: puss-index-backup waiting for 81 job(s) to complete
4690:savegrp: puss-index-backup waiting for 74 job(s) to complete
4690:savegrp: puss-index-backup waiting for 1 job(s) to complete
*** glibc detected *** corrupted double-linked list: 0x0984ff60 ***
/bin/sh: line 1: 13098 Aborted                 /usr/sbin/savegrp -O -l full -G 
puss-index-backup

I tried the suggestion of running a trace on savegrp, but I had no clue of what 
I was looking at in the output. I am wondering if anyone else with 7.5SP3 is 
getting savegrp core dumps. In case you don't know, NetWorker stashes core dump 
files in /nsr/cores. If your NetWorker server is generating savegrp core files 
and you haven't opened a case with EMC about it, please consider doing so. The 
more core dump files EMC has, the more likely they can uncover the cause of the 
problem and fix it. What is also very confusing to me is that there seems to be 
no pattern for what time of day these core files are generated on my server.


On 08 30, 2010, at 7:05 AM, Francis Swasey wrote:

> My goodness!  All that time in development, testing by QA, I assume there was 
> a beta program as
> well... and still it takes three fixes to get it to operate as it goes out 
> the door?  That is
> depressing news.
> 
> Perhaps EMC needs to look to their customers who are reporting these problems 
> and provide
> incentives to them to take part in a beta program since it appears their own 
> QA group is not
> quite up to the task.
> 
> Frank
> 
> On 8/30/10 5:38 AM, Jóhannes Karl Karlsson wrote:
>> We had some problems with 7.5.3 to begin with when we installed build 514. 
>> Groups of Oracle clients not finishing properly (hanging).
>> 
>> EMC then released build 531 and few days later build 533. We installed 
>> NetWorker 7.5.3.1 build 533 and our problems got even worse.
>> 
>> EMC then released a patched version of savegrp.exe build 533. After 
>> installing that patched savegrp.exe binary we have not had any problems.
>> 
>> NetWorker 7.5.3.1 build 533 with patched savegrp.exe binary seems to be 
>> stable and good.
>> 
>> Johannes
>> 
>> 
>> 
>> -----Original Message-----
>> From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] 
>> On Behalf Of Len Philpot
>> Sent: 17. ágúst 2010 15:26
>> To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
>> Subject: Re: [Networker] NSR 75SP3 : Stable for prod ?
>> 
>>> STANLEY R. HORWITZ 
>>> 
>>> What ulimit settings are you using and how many clients are you backing 
>> up?
>>> 
>>> Here's what I have …
>>> 
>>> [root@puss nsr_scripts]# ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 1024
>>> max locked memory       (kbytes, -l) 32
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 1024
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> stack size              (kbytes, -s) 10240
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 143360
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>> 
>> Your's looks like Solaris 10, but this is on 9 (SPARC):
>> 
>> # ulimit -a
>> core file size (blocks)     unlimited
>> data seg size (kbytes)      unlimited
>> file size (blocks)          unlimited
>> open files                  unlimited
>> pipe size (512 bytes)       10
>> stack size (kbytes)         8192
>> cpu time (seconds)          unlimited
>> max user processes          29995
>> virtual memory (kbytes)     unlimited
>> 
>> The two groups that were abending had 41 and 25 clients each (not huge) 
>> and we have a little over 100 clients total. However, the old ulimit 
>> settings (which I don't recall) were from the original Solaris 8 
>> installation back in 2003 (Networker 6.1). So, they weren't exactly big.
>> 
>> To sign off this list, send email to listserv AT listserv.temple DOT edu and 
>> type "signoff networker" in the body of the email. Please write to 
>> networker-request AT listserv.temple DOT edu if you have any problems with 
>> this list. You can access the archives at 
>> http://listserv.temple.edu/archives/networker.html or
>> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
> 
> -- 
> Frank Swasey                    | http://www.uvm.edu/~fcs
> Sr Systems Administrator        | Always remember: You are UNIQUE,
> University of Vermont           |    just like everyone else.
>  "I am not young enough to know everything." - Oscar Wilde (1854-1900)
> 
> To sign off this list, send email to listserv AT listserv.temple DOT edu and 
> type "signoff networker" in the body of the email. Please write to 
> networker-request AT listserv.temple DOT edu if you have any problems with 
> this list. You can access the archives at 
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER