Networker

Re: [Networker] Problem with Legato 6.1.3

2003-07-08 08:43:59
Subject: Re: [Networker] Problem with Legato 6.1.3
From: "David E. Nelson" <david.nelson AT NI DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Tue, 8 Jul 2003 07:43:52 -0500
Hi All,

I haven't been following this closely so if I'm off here, I apologize.

We saw behavior a couple years ago where BU's would start fast then trickle
down to a crawl.  I never did see nsrd slow down, however.  The problem: An
undocumented (at the time) Cisco bug in their switch h/w.  If you want, I can
dig up the details.  Also, Legato received a full analysis describing the
problem and h/w and how to identify it.

Regards,
        /\/elson

On Tue, 8 Jul 2003, Stan Horwitz wrote:

> On Tue, 8 Jul 2003, John Herlihy wrote:
>
> >Hi Stan,
> >
> >I'm having this problem in a similar environment to yours, and was
> >curious how changing these tcp settings on your Networker server has
> >helped.
>
> The kernal changes we made on Friday morning have helped, but only to a
> minor degree. The problem persists where nsrd will slow down to a crawl.
> nsrmmd processes fail to restart. I might be wrong, but this problem seems
> to happen only when we run both our our Mirapoint SnapImage NDMP backups
> concurrently.
>
> Requests for tape mounts and unmounts take hours to complete when this
> slow down condition starts.The nsrwatch, nsradmin, and nsradmin utilities
> also become very slow.  We have a con call scheduled with people at Legato
> later today to discuss our problems. We also spent a good deal of
> yesterday working with our hardware service tech looking at our box.  Our
> hardware service tech (we contract hardware maintenance out) is returning
> this morning with a Solaris and Sun hardware expert to further check our
> system and swap out a SCSI card to see if that helps the situation. There
> are definitely a SCSI hardware issue on our server, but I do not see how
> it can cause the performance decline in NSR that I see quite often.
>
> Further, I have our two NDMP devices on the same SCSI bus. /dev/rmt/0cbn
> and /dev/rmt/1cbn (Solaris 9 with a single CPU Sun Enterprise 450) and we
> back up about 220GB of data (full backup) on each of our two NDMP devices
> nightly (or at least we try). Both NDMP devices are attached to our backup
> server, not the NDMP client so we use SnapImage. SnapImage is also
> installed on our same server as the Legato server software. During periods
> when this slow down occurs, the "top" utility shows low resource
> utilization.
>
> In fact, last night at about 8:30, I kicked off a backup of only ONE of
> our Mirapoint message stores. Just as I am typing this, I received
> notification that that backup completed completed successfully. No tape
> mount requests are pending. Tape mounts are snappy now too. The nsrwatch
> utility was slow to start, but not inordinantly so when I started it up a
> few minutes ago. Throughput of the NDMP backup was a little faster than
> previous backups. This is the first time I tried backing up only one NDMP
> client at a time.
>
> I suspect that if I move the one NDMP device from /dev/rmt/1cbn to
> /dev/rmt/3cbn (which is on a separate SCSI bus) it will help aleviate this
> problem and allow us to back up both our Mirapoint message stores via NDMP
> and SnapImage simultaneously without causing this slow down condition.
> Dumping nearly 450GB of data through one SCSI card all at once is probably
> not a sensible thing to do if an option to spread the NDMP data across
> multiple SCSI cards exists, as it does in my case.
>
> So that's where the situation stands.
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>

--
~~ ** ~~  If you didn't learn anything when you broke it the 1st ~~ ** ~~
                        time, then break it again.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=