Networker

[Networker] NetWorker under Solaris 9

2003-06-13 08:07:13
Subject: [Networker] NetWorker under Solaris 9
From: Stan Horwitz <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 13 Jun 2003 08:06:28 -0400
After a month of running NetWorker 6.1.3, I am less than enthusiastic
about this software. My understanding is that Legato uses Solaris as its
development platform so one would expect the fewest problems by running
NetWorker under Solaris, as opposed to other operating systems, or is that
unrealistic? I have had nothing but problems since migrating our backup
server from NetWorker 6.1.1 under Tru64 Unix to 4.0f to Solaris 9 with
NetWorker 6.1.3.

After installing a new set of hotfix binaries on Wednesday, the server ran
fine through the night. Today, I see that a tape that was used only for
one NDMP SnapImage client's backup is still marked as writing, even though
that client's backup finished.

The hotfix files that I applied on June 11 are from the LGTpa53663 hotfix.
I am curious if anyone else on this list is trying this hotfix.  Legato's
tech support engineer who's been working with me on this case (number
3055074) did tell me that Legato's developers are still verifying this
hotfix (which involves updates to ansrd, nsrd, and nsrmmd), but I figured
since our server seems to malfunction anyway every other day, I had
nothing to lose by trying these new binaries.

What was happening is that the NSR server software would slow to a
crawl.The daemon.log would show tons of RPC errors and failures to cancel
nsrmmd processes that look like:

06/12/03 16:02:36 nsrd: media info: restart of nsrmmd #10 on bootz cancelled

As a result, it would be impossible to complete our nightly scheduled
backups within my backup window. I suspect that if I do not restart the
NSR software today, the same problem will occur again tonight. I might be
wrong, but I do not want to take the chance. However, last night, I see
no errors or cancelation failures in NSR's daemon.log, but I did get a
couple of SCSI warnings from our Big Brother monitoring software that
look like:

Jun 13 02:22:09 bootz scsi: [ID 107833 kern.warning] WARNING:
/pci@4,4000/ANTR,2u3wl@4 (glm26):
Jun 13 02:22:09 bootz glm2: [ID 160360 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6016]
Jun 13 02:22:09 bootz scsi: [ID 107833 kern.warning] WARNING:
/pci@4,4000/ANTR,2u3wl@4/st@1,0 (st85):
Jun 13 02:25:22 bootz scsi: [ID 107833 kern.warning] WARNING:
/pci@4,4000/ANTR,2u3wl@4/st@1,0 (st85):

If anyone has any comments, please let me know. I have reported all of
this to Legato earlier today.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=