[Networker] SUMMARY: strange 7.0 NW problems (prematurely marking tapes

About a month or so ago I posted about my problems with NDMP backups after
upgrading to 7.0.  After finding the time to research the problem by looking
thoroughly through logs, cranking up NDMP debugging to 70 on the NetApp and
examining exactly what was on a "bad" tape, I opened a case with Legato.

First, let me say that I got a return call from an engineer within 20 minutes!
He listened to my observations, asked questions and was knowledgeable about the
product when I asked him questions.  He had me make a couple of configuration
changes and the NDMP errors went away.  I would still argue with him that
there is a bug in Networker, but since I have eliminated the source of the
NDMP errors, this should no longer be a problem.

To make a long story short, it boiled down to failures with NDMP backups
causing Networker to incorrectly record what was on the tape.  The symptoms
are that NW would continue to write to the tape during while the current
backup was running.  But when the next backup ran and it had to mount and
position the tape, it had recorded what was on the tape incorrectly and could
not position the tape to the proper end of tape (mminfo -V and scanner
verified this).  When this happens, NW just marks the tape as full instead of
potentially overwriting any data.

I had this problem with 6.X but it was infrequent enough that I never figured
out the source of the problem.  When I upgraded to 7.X, it happened daily and
I noticed in the Indexes window that there were savesets with 0 data and 0
files.

The key is to look in /nsr/logs/daemon.log for errors related to NDMP.  For my
version of NetApp OS (6.x)

grep 'E R R' daemon.log

found the errors.

So, the configuration changes I made were as follows:

For the NDMP Group(s) make the following configuration change:

View -> Details
        Savegrp parallelism: change 0 to # drives in NDMP pool
        Inactivity timeout:  change 30 to 90

The problem (as I understand it) is that when the parallelism is set to 0, NW
attempts to open lots of NDMP sessions with the filer, but it fails because
the filer only has two tapes and I had about 20 qtrees in my save sets.  This
causes the NDMP failures.  When you set the parallelism to 2, this no longer
happens.  Maybe the fact that I have 4 NetApp clients makes a difference.  I
don't really understand why this makes a difference as I believe I had all of
the target sessions for each NDMP drive set to 1.

Anyway, I hope this helps someone else.

Shelley


--
    Shelley L. Shostak, PhD                     sls AT qstech DOT com
    Voice: (408) 574-3389
    Lead Unix Administrator                     Quicksilver Technology

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
[Networker] SUMMARY: strange 7.0 NW problems (prematurely marking tapes full)