Networker

Re: [Networker] Problem with Legato 6.1.3

2003-07-03 08:23:21
Subject: Re: [Networker] Problem with Legato 6.1.3
From: John Gowing <johng AT SOURCECONSULTING.CO DOT ZA>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 3 Jul 2003 14:33:02 +0200
----- Original Message -----
From: "Stan Horwitz" <stan AT temple DOT edu>
To: "John Gowing" <johng AT SOURCECONSULTING.CO DOT ZA>
Cc: <NETWORKER AT LISTMAIL.TEMPLE DOT EDU>
Sent: Thursday, July 03, 2003 12:51 PM
Subject: Re: [Networker] Problem with Legato 6.1.3


>
> On Thu, 3 Jul 2003, John Gowing wrote:
>
> >Good Day Claudio.
> >
> >I don't have much to add by way of solution at this point, but can say
> >that I am having the same/similar problem. I have a an IBM 3494 Silo with
> >FC attach 3590 drives, also running on 6.1.3 on AIX 5.1. We experience
> >the problem both with backups and with a cloning operation. The cloning
> >happens completely on the NW Server, and thus eliminates the network as a
> >potential cause. When the job slows or hangs we see very high utilisation
> >on the associated NSRMMD task. Stopping the Job, Killing the NSRMMD
> >concerned, allowing the server to restart the NSRMMD after the
> >appropriate timeouts etc, and then restarting the job clears the problem
> >and the job proceeds. However on one occasion this proceedure Hung
> >Networker and we had to restart the whole Networker Server. I have logged
> >a case with Legato on this, will keep you informed.
>
> Just out of curiosity, are you seeing any nsrmmd errors in your daemon.log
> too? I keep getting the error "nsrd: media info:  restart of nsrmmd #24 on
> bootz cancelled". I am having a problem where the NSR server grinds to a
> near halt nightly. Restarting all the nsr daemons results in much faster
> performance, but only for a few hours. Rebooting the server does not help.
> Legato's tech support has been feeding me suggestion after suggestion on
> how to tune the Solaris 9 kernal and adjust the nsr configuration, but
> nothing seems to help and the issue remains unresolved. I have also been
> running nsrmmd in debug mode for the past week or so and supplying NSR's
> tech support with the debug data daily.
>
> I am using NSR 6.1.3 too, on Solaris 9 as the server and nmdp with
> SnapImage on our NetWorker server. I wonder if there is a problem with
> SnapImage and/or nmdmp. The server is a Sun E450 with a single CPU. This
> is getting very frustrating because the system ran fine prior to our
> migration from Tru64 Unix to Solaris 9 a month ago and simultaneously
> installing SnapImage to back up a pair of new Mirapoint message stores.
> The only way I have been able to avoid this problem is to run the ndmp
> backups for our two Mirapoint servers during the day and not run them at
> night when the bulk of our backups are done.
>
> Legato's tech support people are still working on this issue, but it is
> very frustrating having to wait while this problem is being worked out.
> I am starting to wonder if it would be prudent to downgrade to an earlier
> NSR version on our Legato server, or perhaps upgrade to NetWorker 7.


>From John Gowing

We get the following NSRMMD errors

06/21/03 19:51:44 nsrmmd #1: 13680 cannot accept any more connections  -
Software caused connection abort
06/21/03 20:13:53 nsrmmd #2: 13520 cannot accept any more connections  -
Software caused connection abort

But we haven't been able to get any more info on these errors from docs or
KB

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=