Networker

Re: [Networker] Problem with groups not starting

2006-08-14 08:26:18
Subject: Re: [Networker] Problem with groups not starting
From: Denis <denis.mail.list AT FREE DOT FR>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 14 Aug 2006 14:02:27 +0200
Selon "Groth, Jonathan A" <jonathan.groth AT EDS DOT COM>:

> Hello all,
>
> I'm having an odd problem with groups not starting.  Not only not
> starting, but not producing any messages at all.  I'll put in some
> background info below, but first, let me give the basics:
>
> NetWorker Server:  7.2.1
> OS:  Solaris 8
> Group's Autostart set to Enabled
> Schedule set to proper level (full) on the days in question
>
>
> Background (probably too much):
>
> Customer requires end-of-month backups to start at different times on
> different days. So, a couple ksh scripts were written to set the start
> times in advance, so that people wouldn't have to log in on weekends
> when end-of-month falls on those days.  First script takes input (group
> name, new start time, date you want the change to take place, the time
> the change takes place), then it creates a "script" with all the details
> nsradmin would need to do the deed, and finally it appends the input and
> the location of the nsradmin "script", all on one line, to a (what I
> call) control file.  Second script is run from cron * * * * * and checks
> that control file; when the change date and time match, the nsradmin
> "script" on that line is executed and voila! the group's start time is
> changed.  This works perfectly, from what I can see.  But...
>
> ... sometimes (but not most of the time!) the groups that had their
> start time change wouldn't start!  At all!  No messages, nothing in the
> daemon.log, etc.  The times were successfully changed, sure, but when
> the time came, NetWorker just glided past it, doing nothing.  As I wrote
> above, the Groups were enabled just fine and the schedule level set.
> Also worth noting that the change time was set hours before the backup
> was scheduled to start, so that wasn't a conflict.  Also, this problem
> has never cropped up except in those groups changed by the script.  But,
> only 2 of the 20-30 group changes exhibited this behavior.  The rest ran
> to expectations...
>
> We've talked to EMC and they see no reason why this is happening, though
> they believe its caused somehow by the script(s).  I'm inclined to
> agree, given that this has appeared only when the script changes a
> group, but I would expect it to do the same to all of the groups
> changed, not a small percentage of them.  Its arbitrariness puzzles me.
>
> Any thoughts would be appreciated.  I can post/email my uglier-than-sin
> scripts, output, whatever, should anyone desire them.
>
> Thanks for reading!
>
> -Jon
>
>
Hello,

I've experimented this behaviour (under solaris, too) : no start, no messages in
daemon.log, savegrp.log, savepnpc.log, and of course, nothing displayed !

Check if there are files under /nsr/tmp, especially if they are named
your-group-name.res.lck.

If so, stop Networker client, delete this files, and restart.

In a normal conditions (i.e no save running, or no other networker traffic), all
files and directories under /nsr/tmp can be deleted (while networker is stopped,
of course !)

These lock files (*.lck) are used by networker to verify if there are groups
currently running when attempting to start one.

Hope this help.

Denis

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER