Networker

Re: [Networker] networker won't shutdown

2005-10-28 18:06:30
Subject: Re: [Networker] networker won't shutdown
From: Dave Mussulman <mussulma AT CS.UIUC DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 28 Oct 2005 17:02:55 -0500
On Fri, Oct 28, 2005 at 05:33:44PM -0400, Joel Fisher wrote:
> It actually is a pain to manage, but I've just gotten feed up with one
> client that hangs an entire group and then it doesn't run the next
> night.
> 
> Since I have an entry for the server in each group(for index retention
> length purposes) it does actually put quite a load on the server.  I'm
> trying to decide what I'm going to do about that.  It runs 600+
> bootstraps a night which cause the server to not respond at times.  I'm
> considering moving the indexes to another pool so that I don't have to
> worry about the retention time causing my media to not recycle.
> 
> By your response, I take it you use large groups.  Do you not have
> problems with hanging clients?  How do you handle your indexes?

I've got about four groups with about 70 clients per group.  I can't
imagine breaking them into individual groups.  Plus, as you noted, you'd
need to put a server instance in each group so the indexes get backed up
with the proper retention times, and then you'd get a ton of bootstrap
messages.  (I guess alternative you could turn off indexes for each
group and do an index backup once a day, but that might not even get
around the index retention issues.)

I get some occasional hung clients (or media contention that can't load
the tape it needs to write to because other pools are using all of the
drives.)  I watch for it by having a script check to see if any groups
are still running at 10pm and, if so, page me.  (So I can see what's
still running and check it before I go to bed.)  I also watch out for
the "savegroup is already running" error savegroup completion and
manually start a group if I catch it in time.

And then I fix whatever client is causing the savegroup to hang.  In my
environment, the machine is usually hosed some other way and needs a
reboot or rebuild.  (The hanging backups isn't the problem, it's a side
effect of another problem.)  But this happens rarely enough in my
environment that it's not worth creating 400+ groups.  *shudder*

Dave

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER