Networker

[Networker] Some preliminary comments on 7.4

2007-09-19 10:56:13
Subject: [Networker] Some preliminary comments on 7.4
From: Stan Horwitz <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 19 Sep 2007 10:45:43 -0400
Well, its been a few hours short of a week of my upgrading our NetWorker server from 7.2.1 to 7.4. This is on a Solaris 9 box with a mixture of Solaris, Linux, Windows whatever, and Mac OS X clients, about 300 clients in all. My data zone also handles MS SQL, NDMP, and Microsoft Cluster Exchange backups.

My experience with 7.4 is bitter sweet thus far. In a lot of ways, I am really impressed with 7.4 but I do have a couple of serious problems that give me reason for considerable concern.

First, I discovered that NDMP backups work a little bit differently in a three-way environment. The password for the system that hosts the tape library robot needs a password to be entered into the storage node resource, which is new to me.

Second, and of great concern is that there is definitely a bug in how the media database is managed. This problem has been escalated to an EMC NetWorker PSE and I have it at severity 1, although I initiated the case as severity 2. The problem at first appeared to be that NetWorker doesn't appear to handle automated tape cleaning properly. Specifically, it seems that with all 14 of my Sony PetaSite's S-AIT drives set to not use the CDI interface and with each device set for a daily cleaning interval, NetWorker keeps attempting to clean devices that are in use (i.e., reading or writing) and flooding me with emails that those devices were successfully cleaned even though that's impossible. Fortunately, NSR is not decrementing the number of cleaning uses on the 11 cleaning tapes I keep in the library and it is also cleaning drives that do need to be cleaned.

Enabling the CDI feature on each device causes each drive that needs to be cleaned, to be cleaned twice, which will subject the drives to unnecessary wear and tear. Unfortunately, enabling our PetaSite to do auto-cleaning and turning off NSR's auto-cleaning shows that NSR and the PetaSite don't play well with that configuration. This is why I use NSR's auto-cleaning feature.

I noticed that problem last Thursday and I opened up a case with EMC right away, but this weekend, I also discovered another problem which I am sure is connected. The NetWorker Management Console and "nsrjb" do not agree on which tapes are recyclable. At this time, the NWMC's media window shows four recyclable tapes while "nsrjb -C | grep yes" shows no recyclable tapes. When NetWorker attempts to label one of those tapes, it gets into a loop and keeps attempting to label one until I manually label another tape.

I think this media database discrepancy was brought about by a problem where I tried to mark a tape as recyclable using the NetWorker Management Console's GUI and the marking process never finished. I had to reboot my workstation in order to free up NWMC. Now, whenever I attempt to mark a tape as recyclable or remove a tape from the media database, I get an error that says

"39078:nsrmm: RAP error: Mark volume operation already in progress"

I get this error if I use "nsrmmd" at the command line and when I try it via the NWMC GUI. The only way to resolve this issue is to restart NetWorker's daemons, which I suspect clears out its jobs database. I also noticed that if I try to load a cleaning tape manually by issuing an "nsrjb -l " command, the nsrmmgd process core dumps, then restarts a minute or two later. Resetting the tape library and doing an inventory of it doesn't help.

I spent several hours with an EMC engineer on the phone and web exing yesterday, so I am hopeful that between the information that was gleaned from that session and from all the support files I sent, that a solution will be forthcoming soon ... I hope!

Third, I just discovered a few minutes ago a tiny bug that's of miniscule consequence. The bug is that in the monitoring window in the sessions section, the start time for at least one of my save streams is reported as 5:15 AM while the actual start time, as reported in the groups section is 5:15 PM. I just noticed this issue, so I have not reported it to EMC.

I also am having that problem with truncated savegroup reports; however, for me, its a minor issue because I find the NWMC GUI to offer enough information to allow me to see what went on with each savegroup's backups. I do intend to apply the fix for that, but its not among my top priorities. I also had some reporting scripts that no longer work; however, in my case, they are all obsoleted by the NWMC's GUI and I knew they would fail prior to upgrading to 7.4.

Believe it or not, I like NetWorker 7.4 and I feel that EMC is being responsive to my requests for assistance. I started to think about ditching 7.4 and going with the latest 7.3.3, but I am going to give EMC a chance to resolve the issues I cited in this message. I have experienced problems with earlier versions that were much more difficult to troubleshoot than this media management problem. I also really like the new NWMC GUI. I have demonstrated the new GUI to several colleagues and they are all impressed with it. After I migrate to a new Sun T2000 with more disk capacity, I intend to enable all the report tracking feature. Right now, my Sun Fire V480 doesn't have enough disk capacity to support the reporting I want to do, but that problem will be solved in another month or so when I upgrade hardware.

--
Stan Horwitz
Temple University
Enterprise Systems Group
stan AT temple DOT edu

CONFIDENTIALITY STATEMENT: The information contained in this e-mail, including attachments, is the confidential information of, and/or is the property of, Temple University. The information is intended for use solely by the individual or entity named in the e-mail. If you are not an intended recipient or you received this in error, then any review, printing, copying, or distribution of any such information is prohibited. Please notify the sender immediately by reply e-mail and then delete this e-mail from your system.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>