Re: [Networker] Problem with ending a savegroup

Hi, Davina,

You're right; a savegroup switch would be desirable; however, it should
be more "intelligent" than just an on/off "kill savegroup if already
running?" sort of thing. You might, for example, want a full backup to
kill a running incremental but not an incremental kill a running full
(sadly, some fulls do run more than 24 hours <sigh>).

But all the clients in a group don't have to be on the same schedule, so
that complicates things. Perhaps a timeout parameter associated with the
client? "Kill if still running after 'x' hours"? Maybe separate timeout
parameters for full, incremental and differential?

Ultimately, I guess I'm hoping for a solution where no client can hang
up a whole group. 

Of course, your guess is correct: Most (but not all) of the miscreants
are indeed Wintel boxes, more than would be predicted by the
Windows/Unix ratio.

Thanks for your input,

Conrad



-----Original Message-----
From: Davina Treiber [mailto:Treiber AT hotpop DOT com] 
Sent: Thursday, August 11, 2005 6:55 AM
To: Legato NetWorker discussion; Conrad Macina
Subject: Re: [Networker] Problem with ending a savegroup

Conrad Macina wrote:
> If you ask the team here what our Number One NetWorker Problem is,
this is
> it. We experience hung savegroups every day, mostly in our larger data
> zones. As Davina said, the only "solution" is to kill the running
savegroup
> just before it is scheduled to run.

I suppose that this same solution could be incorporated into the product

if Legato desired. The current behaviour is that a group will fail if 
the same group is already running, and of course it does give you a 
notification when this occurs, however it could be possible to add a 
switch in the group resource that would give the option to kill an 
already running group before starting another instance. This is exactly 
what my script does, although it's not as neat as having it built into 
the product.

> 
> Interestingly, on some occasions, issuing the kill command to the
client's
> nsrexec process causes the group to end successfully. Go figure!

Well yes, this is what I would expect. If the server's nsrexec 
connection to the client is broken, then that save set can complete 
(most likely unsuccessfully), so the work list can clear down and the 
group will finish.

> 
> We have been discussing this with a NetWorker Product Manager. I'll
post
> anything I learn to the list.

What do you expect as a result from this? What kind of solution can you 
envisage? Also I'll be willing to bet that the hung clients are Windows 
machines in the majority of cases. Am I right?

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listserv.temple DOT edu or visit the list's Web site at
http://listserv.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=