Networker

[Networker] savepnpc sporadically not coming to an end

2003-03-03 06:25:59
Subject: [Networker] savepnpc sporadically not coming to an end
From: Andre Beck <networker AT IBH DOT DE>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Mon, 3 Mar 2003 12:25:51 +0100
Hi,

after Legato Support instantly replied to our savepnpc problems with
6.2 on W2k with a patch (which meanwhile became an official one) to
preclntsave.exe we thought it was all fixed. From first tests, savepnpc
did what was expected from it, not only on clients with just one saveset,
but as well on those with more. But we are now facing another problem,
which only hits sporadically and is thus much harder to debug.

The problem is, every now and then, a savepnpc session doesn't come to
an end. When watching the save interactively, it is obvious that all
savesets of the client are written to the server, but when the last one
of them has dumped enough data so one would expect it to be completed,
neither the client nor the server notice that completion. The client's
save session will stay active until an eventual savepnpc timeout will
kill it hours later, and even after that, the server will still not
notice the savegroup to be done. The only way to stop the group is to
abort it manually.

Logs show afterwards:

----------------------------------------------------------------------
NetWorker Savegroup: (alert) GROUP aborted, 1 client(s) (client06 Failed)
Start time:   Mon Feb 03 21:00:01 2003
End time:     Tue Feb 04 16:53:01 2003

--- Never Started Save Sets ---

savegrp: client06:index index was never started

--- Unsuccessful Save Sets ---

* client06:C:\ aborted
* client06:D:\ aborted
* client06:F:\ aborted

--- Successful Save Sets ---

* client06:E:\ <precmd output>
* client06:E:\ <precmd output>
  client06: E:\                       level=full,     16 GB 01:03:04   1057 
files
---------------------------------------------------------------------

It appears that the client never noticed the savesets of C:, D: and F:
to be completed, while it did notice that for E:. This is a strange
coincidence with the original problem (before fixing preclntsave.exe)
where savepnpc would only work correctly if the client specified exactly
one saveset and broke for 2 or more. Furthermore, the above never happened
in our tests when the client was configured with only one saveset.

But as already said, the problem actually is that the above is not
*always* happening. Sometimes the savepnpc succeeds, sometimes it will
fail in the way described (all savesets are actually written, but there
is no notice of completion).

More debugging results:

a) We replaced the Client software on client06, uninstalling 6.2 and
   installing 6.1.3 instead. This did *not* change anything.
b) Another savepnpc client produces a weird savegroup completion log
   line:

---------------------------------------------------------------------------
h*þh*þ 0 KB 00:00:01      0 files
  server99.dom: index:client01         level=9,        26 MB 00:00:04     24 
files
---------------------------------------------------------------------------

c) When eliminating savepnpc completely by running the post- and pre-
   commands at-controlled and using the standard savegroup, a save
   sometimes gets weird errors:

---------------------------------------------------------------------------
Feb 19 21:15:01 server99.dom: NetWorker Savegroup: (info) starting GROUP (with 
1 client(s))
Feb 19 23:18:52 server99.dom: NetWorker Savegroup: (notice) GROUP completed, 1 
client(s) (All Succeeded)
Feb 19 23:18:52 server99.dom: Start time:   Wed Feb 19 21:15:01 2003
Feb 19 23:18:52 server99.dom: End time:     Wed Feb 19 23:18:52 2003
Feb 19 23:18:52 server99.dom: --- Successful Save Sets ---
Feb 19 23:18:52 server99.dom:   client06: SYSTEM FILES:\            level=full, 
   237 MB 00:01:32   1952 files
Feb 19 23:18:52 server99.dom: * client06:SYSTEM DB:\ Removable Storage Database 
- rsmow: Exported the RSM database.
Feb 19 23:18:52 server99.dom:   client06: SYSTEM DB:\               level=full, 
   922 KB 00:00:18     13 files
Feb 19 23:18:52 server99.dom:   client06: SYSTEM STATE:\            level=full, 
    13 MB 00:00:16     17 files
Feb 19 23:18:52 server99.dom:   client06: C:\                       level=full, 
  1272 MB 00:04:46  19500 files
Feb 19 23:18:52 server99.dom:   client06: D:\                       level=full, 
  6468 MB 00:28:49  32322 files
Feb 19 23:18:52 server99.dom:   client06: E:\                       level=full, 
    17 GB 00:59:03   1104 files
Feb 19 23:18:52 server99.dom: * client06:E:\ 02/19/03 22:50:06 nsrexec: 
Attempting a kill on remote save
Feb 19 23:18:52 server99.dom: * client06:E:\ aborted due to inactivity
Feb 19 23:18:52 server99.dom:   client06: F:\                       level=full, 
    57 GB 01:57:37    217 files
Feb 19 23:18:52 server99.dom:   server99.dom: index:client06          
level=full,     87 MB 00:00:14    211 files
-------------------------------------------------------------------------

   I'm not sure whether it is related, but at least it does as well happen
   sporadically (but less often than the hanging savepnpc) and deals with a
   saveset that didn't come to an end somehow.

d) We reduced the pre and post commands to simple scripts just setting
   some environment variables, outputting the environment to some log
   files, and calling "ipconfig/all" for some more output. Nothing really
   complex at all. The behavior remained sporadic, i.e. a script that
   just ran around a successfully completed savepnpc failed in the next
   run. That gives the impression that the problem is completely unrelated
   to the script contents.

If I've groked the FAQ correctly, savepnpc completion is a client/server
process, meaning the client is constantly querying the server whether all
scheduled saves are completed. So this *might* be a serverside problem
only, but I'm entirely not sure.

Now I'm completely lost - should I really start sniffing to track this
down? Or has anyone already seen such things, with savepnpc or without,
and can attribute them to some weird config issue or knows the bug?
I'm almost considering to "sidegrade" the server to 6.1.3 and drop 6.2
on that side (I expect a 6.1.3 server to still be able to deal with the
6.2 clients required for saving XP clients) - but that's really invasive
and I don't want to do that just to see the same thing still happening
later.

TIA,
Andre.
--
      1984? Umm... 1994? Err... 2004? Yeah - We finally got it!
   Micro$oft Palladium for TCPA - Making mankind a big brotherhood.

-> Andre Beck    +++ ABP-RIPE +++    IBH Prof. Dr. Horn GmbH, Dresden <-

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>