Re: [Networker] savepnpc sporadically not coming to an end

On Mon, 3 Mar 2003 12:25:51 +0100, Andre Beck <networker AT IBH DOT DE> wrote:

>after Legato Support instantly replied to our savepnpc problems with
>6.2 on W2k with a patch (which meanwhile became an official one) to
>preclntsave.exe we thought it was all fixed. From first tests, savepnpc
>did what was expected from it, not only on clients with just one saveset,
>but as well on those with more. But we are now facing another problem,
>which only hits sporadically and is thus much harder to debug.
>
>The problem is, every now and then, a savepnpc session doesn't come to
>an end. When watching the save interactively, it is obvious that all
>savesets of the client are written to the server, but when the last one
>of them has dumped enough data so one would expect it to be completed,
>neither the client nor the server notice that completion. The client's
>save session will stay active until an eventual savepnpc timeout will
>kill it hours later, and even after that, the server will still not
>notice the savegroup to be done. The only way to stop the group is to
>abort it manually.
>
>Logs show afterwards:
>
>----------------------------------------------------------------------
>NetWorker Savegroup: (alert) GROUP aborted, 1 client(s) (client06 Failed)
>Start time:   Mon Feb 03 21:00:01 2003
>End time:     Tue Feb 04 16:53:01 2003
>
>--- Never Started Save Sets ---
>
>savegrp: client06:index index was never started
>
>--- Unsuccessful Save Sets ---
>
>* client06:C:\ aborted
>* client06:D:\ aborted
>* client06:F:\ aborted
>
>--- Successful Save Sets ---
>
>* client06:E:\ <precmd output>
>* client06:E:\ <precmd output>
>  client06: E:\                       level=full,     16 GB 01:03:04
1057 files
>---------------------------------------------------------------------
>
>It appears that the client never noticed the savesets of C:, D: and F:
>to be completed, while it did notice that for E:. This is a strange
>coincidence with the original problem (before fixing preclntsave.exe)
>where savepnpc would only work correctly if the client specified exactly
>one saveset and broke for 2 or more. Furthermore, the above never happened
>in our tests when the client was configured with only one saveset.
>
>But as already said, the problem actually is that the above is not
>*always* happening. Sometimes the savepnpc succeeds, sometimes it will
>fail in the way described (all savesets are actually written, but there
>is no notice of completion).
>
>More debugging results:
>
>a) We replaced the Client software on client06, uninstalling 6.2 and
>   installing 6.1.3 instead. This did *not* change anything.
>b) Another savepnpc client produces a weird savegroup completion log
>   line:
>
>---------------------------------------------------------------------------
>h*þh*þ 0 KB 00:00:01      0 files
>  server99.dom: index:client01         level=9,        26 MB 00:00:04
24 files
>---------------------------------------------------------------------------
>
>c) When eliminating savepnpc completely by running the post- and pre-
>   commands at-controlled and using the standard savegroup, a save
>   sometimes gets weird errors:
>
>---------------------------------------------------------------------------
>Feb 19 21:15:01 server99.dom: NetWorker Savegroup: (info) starting GROUP
(with 1 client(s))
>Feb 19 23:18:52 server99.dom: NetWorker Savegroup: (notice) GROUP
completed, 1 client(s) (All Succeeded)
>Feb 19 23:18:52 server99.dom: Start time:   Wed Feb 19 21:15:01 2003
>Feb 19 23:18:52 server99.dom: End time:     Wed Feb 19 23:18:52 2003
>Feb 19 23:18:52 server99.dom: --- Successful Save Sets ---
>Feb 19 23:18:52 server99.dom:   client06: SYSTEM FILES:\
level=full,    237 MB 00:01:32   1952 files
>Feb 19 23:18:52 server99.dom: * client06:SYSTEM DB:\ Removable Storage
Database - rsmow: Exported the RSM database.
>Feb 19 23:18:52 server99.dom:   client06: SYSTEM DB:\
level=full,    922 KB 00:00:18     13 files
>Feb 19 23:18:52 server99.dom:   client06: SYSTEM STATE:\
level=full,     13 MB 00:00:16     17 files
>Feb 19 23:18:52 server99.dom:   client06: C:\
level=full,   1272 MB 00:04:46  19500 files
>Feb 19 23:18:52 server99.dom:   client06: D:\
level=full,   6468 MB 00:28:49  32322 files
>Feb 19 23:18:52 server99.dom:   client06: E:\
level=full,     17 GB 00:59:03   1104 files
>Feb 19 23:18:52 server99.dom: * client06:E:\ 02/19/03 22:50:06 nsrexec:
Attempting a kill on remote save
>Feb 19 23:18:52 server99.dom: * client06:E:\ aborted due to inactivity
>Feb 19 23:18:52 server99.dom:   client06: F:\
level=full,     57 GB 01:57:37    217 files
>Feb 19 23:18:52 server99.dom:   server99.dom: index:client06
level=full,     87 MB 00:00:14    211 files
>-------------------------------------------------------------------------
>
>   I'm not sure whether it is related, but at least it does as well happen
>   sporadically (but less often than the hanging savepnpc) and deals with a
>   saveset that didn't come to an end somehow.


I think your tests show that the problem exists with or without savepnpc.
save and savepnpc are in fact the same binary, so the save part works
exactly the same. I don't think that your problem is actually savepnpc
related, just fairly standard save problems. Although they are fairly
standard, they can be hard to debug. It seems that the connection between
the client and server is being lost. This could be due to network problems,
or possibly name resolution issues. Continue to debug them using save
rather than savepnpc. Perhaps try save -v as your backup command.

>
>d) We reduced the pre and post commands to simple scripts just setting
>   some environment variables, outputting the environment to some log
>   files, and calling "ipconfig/all" for some more output. Nothing really
>   complex at all. The behavior remained sporadic, i.e. a script that
>   just ran around a successfully completed savepnpc failed in the next
>   run. That gives the impression that the problem is completely unrelated
>   to the script contents.
>
>If I've groked the FAQ correctly, savepnpc completion is a client/server
>process, meaning the client is constantly querying the server whether all
>scheduled saves are completed. So this *might* be a serverside problem
>only, but I'm entirely not sure.
I doubt that the problem is the server - most likely network related in
some way.

>
>Now I'm completely lost - should I really start sniffing to track this
>down? Or has anyone already seen such things, with savepnpc or without,
>and can attribute them to some weird config issue or knows the bug?
>I'm almost considering to "sidegrade" the server to 6.1.3 and drop 6.2
>on that side (I expect a 6.1.3 server to still be able to deal with the
>6.2 clients required for saving XP clients) - but that's really invasive
>and I don't want to do that just to see the same thing still happening
>later.

Yes, a 6.1.3 server should have no problem with 6.2 clients, it's one of
those acceptable exceptions to the rule. However, (despite my previous
advice) I think that having moved to 6.2 you may have burnt your bridges.
6.2 contains the change (planned for 7.0) that splits up nsr.res and
nsrjb.res into many files. You probably can't go back without totally
reconfiguring your server. Probably best to stick with 6.2 now that you
have it.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=