Dear All,
I installed Backuppc in ubuntu 10.10desktop edition all clients systcem is windows.
I dont use samba i want only used rsync throw cygwin.
Is it possible or not.Suppose if it is possible send me link for documents
Thanks
Audinarayana Reddy
> Date: Sun, 21 Aug 2011 10:11:45 -0700
> From: rlpowell AT digitalkingdom DOT org
> To: backuppc-users AT lists.sourceforge DOT net
> Subject: [BackupPC-users] Solved, again; devs please read (was Re: The one-at-a-time nightly problem returns (was Re: The one-at-a-time nightly problem, debugged; DEVS PLEASE READ; long.))
>
>
> Ah-*HAH*! Got it!
>
> So (part of) the problem, *AGAIN*, is the way the CmdQueue tests the
> *front* of the queue, but pulls jobs from the *BACK* of the queue.
>
> Now, what happens if a backup goes for a really long time? Like,
> more than 24 hours? Well, even though it's still running, an entry
> for that host is also put in the background queue, which means that
> when the current backup finshes, a new one runs immediately, which
> is fine.
>
> The problem occurs when the current backup finishes *during a
> nightly run*. This means that the link can't run. But it gets
> queued on the CmdQueue. Then the backup on the BgQueue starts.
>
> Then the nightlies end, and the link on the CmdQueue tries to run.
> It refuses, with a "Botch on admin job" message, because a backup
> for that host is running. This happens many many many many many
> times, until the next nightlies try to run.
>
> So the new nightlies get pushed onto the *FRONT* of the CmdQueue via unshift;
> now the queue is a bunch of nightlies and some
>
> Here's the top of the CmdQueue loop:
>
> while ( $CmdJob eq "" && @CmdQueue > 0 && $RunNightlyWhenIdle != 1
> || @CmdQueue > 0 && $RunNightlyWhenIdle == 2
> && $bpc->isAdminJob($CmdQueue[0]->{host})
> ) {
> local(*FH);
> $req = pop(@CmdQueue);
>
> $host = $req->{host};
> if ( defined($Jobs{$host}) ) {
> print(LOG $bpc->timeStamp,
> "Botch on admin job for $host: already in use!!\n");
> #
> # This could happen during normal opertion: a user could
> # request a backup while a BackupPC_link is queued from
> # a previous backup. But it is unlikely. Just put this
> # request back on the end of the queue.
> #
> unshift(@CmdQueue, $req);
> return;
> }
>
> So the loop can be entered, no problem, because it's nightly time
> ($RunNightlyWhenIdle == 2 is true) and the first job in the queue is
> a nightly job ( $bpc->isAdminJob($CmdQueue[0]->{host} is true) and
> there's certainly more than one such job. So it enters the queue,
> and then *POPS* the *LAST* job off the queue. This is a *LINK* job,
> not a nightly, and, better still, *it fails*, which means that the
> *front* of the queue now holds a link job.
>
> Now we're back at the top of that while, which works because CmdJob
> (which gets cleared at the end of every successful CmdQueue job
> but does not get set in the failure case) is empty, so we take the
> first branch, since "$RunNightlyWhenIdle != 1" is true. So we kick
> off the nightly that's at the end of the queue (which, since unshift
> was used in order, is the first one).
>
> Now we have a nightly running (CmdJob is not ""), *AND* the first
> job in the queue is a link job (
> $bpc->isAdminJob($CmdQueue[0]->{host}) is false). This means we
> can't enter either branch of the CmdQueue loop, and we're stuck
> until the nightly finishes.
>
> This is what the logs look like in this case:
>
> 2011-08-21 07:00:02 Running 16 BackupPC_nightly jobs from 0..15 (out of 0..15)
> 2011-08-21 07:00:02 Botch on admin job for [host2]: already in use!!
> 2011-08-21 07:00:02 Next wakeup is 2011-08-21 08:00:00
> 2011-08-21 07:00:04 Running BackupPC_nightly -m 0 15 (pid=17895)
>
> So there's the problem with testing the front of the queue and then
> popping the back of the queue (!!), which I pointed out long ago and
> the devs don't seem to want to fix. Fixing that would allow the
> nightlies to run even when there's a link job mucking things up, but
> it wouldn't stop the logs from being spammed with Botch messages.
>
> Here's two options that would:
>
> 1. Don't start a backup, even a bgQueue backup, when a link for
> that host is pending.
>
> 2. Don't put a backup for a host on the bgQueue when a backup (or
> link job) for that host is currently running, even in QueueAllPCs.
>
> -Robin
>
> On Sun, Aug 21, 2011 at 08:52:38AM -0700, Robin Lee Powell wrote:
> > So, I haven't figured out why yet, but this keeps happening on my
> > hosts *even though* I'm not running "BackupPC_serverMesg
> > BackupPC_nightly run".
> >
> > As far as I know, other than "BackupPC_serverMesg server reload"
> > running from a script every once in a while, and having an extremely
> > large configuration, I'm doing nothing abnormal.
> >
> > I am now running 3.2.1, so it's not an old bug; this happened to me
> > on two different hosts within the last couple of days.
> >
> > What seems to be happening is that after a while I get:
> >
> > 2011-08-19 10:21:34 Botch on admin job for [host1]: already in use!!
> > 2011-08-19 10:21:40 Botch on admin job for [host1]: already in use!!
> > 2011-08-19 10:21:40 Botch on admin job for [host1]: already in use!!
> > 2011-08-19 10:21:40 Botch on admin job for [host1]: already in use!!
> > 2011-08-19 10:21:40 Botch on admin job for [host1]: already in use!!
> > 2011-08-19 10:21:40 Botch on admin job for [host1]: already in use!!
> > 2011-08-19 10:21:43 Botch on admin job for [host1]: already in use!!
> >
> > and then after that, the nightly jobs only run one at a time until I
> > completely restart the server. Which I really don't want to do on
> > one of the servers because I'm in the middle of a 5 day backup.
> >
> > This is really bad for my systems, and really frustrating. -_-
> >
> > -Robin
> >
> >
> > On Wed, Nov 24, 2010 at 06:11:20PM -0800, Robin Lee Powell wrote:
> > >
> > > Figured it out. The problem was that I have BackupPC set to run 8
> > > nightlies at once (which usually takes 12 or more hours), but it was
> > > ending up in a state where only one was running at a time.
> > >
> > > This may be the longest, most detailed debugging writeup I've ever
> > > done in 15 years of being a computer professional; I hope y'all
> > > appreciate it. :) I had to do this to hold all the relevant state
> > > in my head.
> > >
> > > It turns out that the issue occurs when the 24-hour-ly nightlies job
> > > is already running, and you do
> > >
> > > sudo -u backuppc BackupPC_serverMesg BackupPC_nightly run
> > >
> > > which I've been doing a lot.
> > >
> > > Deciding to queue new nightly jobs goes like this:
> > >
> > > while ( $CmdJob eq "" && @CmdQueue > 0 && $RunNightlyWhenIdle != 1
> > > || @CmdQueue > 0 && $RunNightlyWhenIdle == 2
> > > && $bpc->isAdminJob($CmdQueue[0]->{host}) ) {
> > >
> > > We'll be coming back to this a lot. "isAdminJob" matches nightly
> > > jobs only AFAICT.
> > >
> > > CmdQueue State: Empty
> > > CmdJob: Empty
> > > RunNightlyWhenIdle: 0
> > > While State: False, since @CmdQueue <= 0
> > > Running Job State: Empty
> > > Event:
> > >
> > > Normal nightly run occurs. RunNightlyWhenIdle is set to 1, which
> > > triggers all the nightly jobs getting added to the queue, and
> > > RunNightlyWhenIdle getting set to 0
> > >
> > > CmdQueue State: 8 nightly jobs
> > > CmdJob: Empty
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: True
> > > While State: True, via Branch 2
> > > Running Job State: Empty
> > > Event:
> > >
> > > Nightly jobs get kicked off, all 8 of them.
> > >
> > >
> > > CmdQueue State: Empty
> > > CmdJob: non-empty; "admin7" or similar
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: False
> > > While State: False, since @CmdQueue <= 0
> > > Running Job State: 8 nightly jobs
> > > Event:
> > >
> > > A backup finishes, and queues up a BackupPC_link job. This
> > > happens several times, since the nightly jobs take 8+ hours, even
> > > split into 8 parts, on my machine (4+TiB of backups per backup
> > > machine).
> > >
> > > CmdQueue State: Several link jobs
> > > CmdJob: non-empty; "admin7" or similar
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: False; link jobs don't match
> > > While State: False
> > > Running Job State: 8 nightly jobs
> > > Event:
> > >
> > > User runs "sudo -u backuppc BackupPC_serverMesg BackupPC_nightly
> > > run". This causes RunNightlyWhenIdle to be set to 1, but before
> > > that hits the while, the jobs are actually queued, *USING
> > > unshift*, which puts them at the front of the queue. This is
> > > where things start to go horribly wrong.
> > >
> > > CmdQueue State: 8 nightly jobs, *THEN* Several link jobs
> > > CmdJob: non-empty; "admin7" or similar
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: *TRUE*
> > > While State: True, branch 2
> > > Running Job State: 8 nightly jobs
> > > Event:
> > >
> > > *Pop* a job from the queue. This means that even though the
> > > *test* is for the job from the *front* of the queue, the job that
> > > actually gets handled is the job at the *end* of the queue.
> > >
> > > So, the last job on the queue, a link job, gets run. THIS SHOULD
> > > NEVER HAPPEN, as I understand it, because nightly jobs (the first
> > > set) are still running. The link job sets CmdJob, but that
> > > doesn't matter because we're going through the *second* branch of
> > > the while, which doesn't care about CmdJob. So, it happily
> > > launches another link job:
> > >
> > > CmdQueue State: 8 nightly jobs, then N-1 link jobs
> > > CmdJob: hostname non-empty from the last link job
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: True
> > > While State: True, branch 2
> > > Running Job State: 8 nightly jobs, 1 link job
> > > Event:
> > >
> > > Runs the next link job. And all the others. We end up with *all*
> > > link jobs running at once. THIS SHOULD NEVER HAPPEN; CmdQueue is
> > > supposed to be one at a time.
> > >
> > > But wait, it gets better!
> > >
> > > When each link job starts, it sets $CmdJob to its own host name;
> > > this means that at the end of the run through the queue, it's set
> > > to the last link job that ran, like this:
> > >
> > > CmdQueue State: 8 nightly jobs
> > > CmdJob: hostname from the last link job
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: True
> > > While State: True, branch 2
> > > Running Job State: 8 nightly jobs, N link jobs
> > > Event:
> > >
> > > So, from here, it tries to run the last (remember, pop) nightly
> > > job, but it can't, because any given nightly segment can only run
> > > one at a time, because they are named to prevent duplicates
> > > (leading to the "Botch on admin job for admin7 : already in use!!"
> > > log messages: that means the 8th nightly job is running, so you
> > > can't start it again).
> > >
> > > Having failed to run the nightly job, it unshifts it onto the
> > > front of the CmdQueue.
> > >
> > > It runs through all the queued nightly jobs in this way.
> > >
> > > Eventually, a link job finishes.
> > >
> > > CmdQueue State: 8 nightly jobs
> > > CmdJob: hostname from the last link job
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: True
> > > While State: True, branch 2
> > > Running Job State: 8 nightly jobs, N-1 link jobs
> > > Event:
> > >
> > > When each link job finshes, this test runs:
> > >
> > > if ( $CmdJob eq $host || $bpc->isAdminJob($host) ) {
> > >
> > > This will only match the last link job. If it doesn't match, the
> > > host is tested for whether it needs linking, and if so, it
> > > requeues the (already completed) link, *at the front of the
> > > queue*. This happens for all but the last link job. This means
> > > the thing at the front of the queue isn't a nightly job, so branch
> > > 2 of the while fails, which would restore some kind of order.
> > >
> > > The kicker is that the last job, the one that passes that if
> > > above, *clears CmdJob*, which means the first branch *passes*.
> > >
> > > CmdQueue State: N-1 link jobs, 8 nightly jobs
> > > CmdJob: *EMPTY*
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: *FALSE*
> > > While State: True, *BRANCH ONE*
> > > Running Job State: 8 nightly jobs
> > > Event:
> > >
> > > It pops *the last job*, a nightly, which fails, gets requeud *at
> > > the front*, without setting CmdJob, and now we're back to Branch 2
> > > passing. This happens with all the nightly jobs, and then it's
> > > ready to re-run all the (already completed once) link jobs:
> > >
> > > CmdQueue State: 8 nightly jobs, N-1 link jobs
> > > CmdJob: Empty
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: *TRUE*
> > > While State: True, *BRANCH TWO*
> > > Running Job State: 8 nightly jobs
> > > Event:
> > >
> > > Runs a link job.
> > >
> > > CmdQueue State: 8 nightly jobs, N-2 link jobs
> > > CmdJob: Empty
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: *TRUE*
> > > While State: True, *BRANCH TWO*
> > > Running Job State: 8 nightly jobs, 1 link job
> > > Event:
> > >
> > > This keeps happening until all the link jobs have been completed
> > > (meaning that the last link job to finish has been run a total of
> > > N times):
> > >
> > > CmdQueue State: 8 nightly jobs
> > > CmdJob: Empty
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: True
> > > Running Job State: 8 nightly jobs
> > > While State: True, branch 2
> > >
> > >
> > >
> > > Here's a log snippet of all this craziness:
> > >
> > > 2010-11-24 07:29:45 Running BackupPC_link host00002 (pid=26391)
> > > 2010-11-24 07:29:45 Running BackupPC_link host00005 (pid=26392)
> > > 2010-11-24 07:29:45 Running BackupPC_link hostother1 (pid=26393)
> > > 2010-11-24 07:29:45 Running BackupPC_link host00006 (pid=26394)
> > > 2010-11-24 07:29:45 Running BackupPC_link host00011 (pid=26395)
> > > 2010-11-24 07:29:45 Running BackupPC_link host00000 (pid=26396)
> > > 2010-11-24 07:29:45 Running BackupPC_link host00003 (pid=26397)
> > > 2010-11-24 07:29:45 Running BackupPC_link host00004 (pid=26398)
> > > 2010-11-24 07:29:45 Botch on admin job for admin7 : already in use!!
> > > 2010-11-24 07:29:45 Finished host00004 (BackupPC_link host00004)
> > > 2010-11-24 07:29:45 Botch on admin job for admin : already in use!!
> > > 2010-11-24 07:29:47 Botch on admin job for admin1 : already in use!!
> > > 2010-11-24 07:29:47 Botch on admin job for admin2 : already in use!!
> > > 2010-11-24 07:29:47 Botch on admin job for admin3 : already in use!!
> > > 2010-11-24 07:29:47 Botch on admin job for admin4 : already in use!!
> > > 2010-11-24 07:29:47 Botch on admin job for admin5 : already in use!!
> > > 2010-11-24 07:29:57 Botch on admin job for admin6 : already in use!!
> > > 2010-11-24 07:29:57 Botch on admin job for admin7 : already in use!!
> > > 2010-11-24 07:29:57 Running BackupPC_link host00002 (pid=26411)
> > > 2010-11-24 07:29:57 Running BackupPC_link host00005 (pid=26412)
> > > 2010-11-24 07:29:57 Running BackupPC_link hostother1 (pid=26413)
> > > 2010-11-24 07:29:57 Running BackupPC_link host00006 (pid=26414)
> > > 2010-11-24 07:29:57 Running BackupPC_link host00011 (pid=26415)
> > > 2010-11-24 07:29:57 Running BackupPC_link host00003 (pid=26416)
> > > 2010-11-24 07:29:57 Running BackupPC_link host00000 (pid=26417)
> > > 2010-11-24 07:29:57 Botch on admin job for admin : already in use!!
> > > 2010-11-24 07:29:57 Finished host00000 (BackupPC_link host00000)
> > > 2010-11-24 07:29:57 Botch on admin job for admin1 : already in use!!
> > > 2010-11-24 07:29:57 Botch on admin job for admin2 : already in use!!
> > > 2010-11-24 07:29:59 Botch on admin job for admin3 : already in use!!
> > > 2010-11-24 07:29:59 Botch on admin job for admin4 : already in use!!
> > > 2010-11-24 07:29:59 Botch on admin job for admin5 : already in use!!
> > > 2010-11-24 07:29:59 Botch on admin job for admin6 : already in use!!
> > > 2010-11-24 07:29:59 Botch on admin job for admin7 : already in use!!
> > > 2010-11-24 07:30:18 Botch on admin job for admin : already in use!!
> > > 2010-11-24 07:30:18 Running BackupPC_link host00002 (pid=26453)
> > > 2010-11-24 07:30:18 Running BackupPC_link host00005 (pid=26454)
> > > 2010-11-24 07:30:18 Running BackupPC_link hostother1 (pid=26455)
> > > 2010-11-24 07:30:18 Running BackupPC_link host00006 (pid=26456)
> > > 2010-11-24 07:30:18 Running BackupPC_link host00011 (pid=26457)
> > > 2010-11-24 07:30:18 Botch on admin job for admin1 : already in use!!
> > > 2010-11-24 07:30:18 Running BackupPC_link host00003 (pid=26459)
> > > 2010-11-24 07:30:18 Botch on admin job for admin2 : already in use!!
> > > 2010-11-24 07:30:18 Botch on admin job for admin3 : already in use!!
> > > 2010-11-24 07:30:18 Finished host00003 (BackupPC_link host00003)
> > >
> > > None of this explains why we end up with only one nightly job
> > > running at a time, but it's background on how very badly messed up
> > > things can get if you do "BackupPC_nightly run".
> > >
> > > Here's a repeat of the while, to save you some scrolling:
> > >
> > > while ( $CmdJob eq "" && @CmdQueue > 0 && $RunNightlyWhenIdle != 1
> > > || @CmdQueue > 0 && $RunNightlyWhenIdle == 2
> > > && $bpc->isAdminJob($CmdQueue[0]->{host}) ) {
> > >
> > > Ok, so, the current state is this:
> > >
> > > CmdQueue State: 8 nightly jobs
> > > CmdJob: Empty
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: True
> > > Running Job State: 8 nightly jobs
> > > While State: True, branch 2
> > >
> > > So it keep running through the queue and spewing "Botch" messages
> > > to the logs. Then one of the nightlies *actually finishes*. This
> > > means one of the *other* nightlies can actually run, which sets
> > > CmdJob, which is our prelude to disaster.
> > >
> > > CmdQueue State: *SEVEN* nightly jobs
> > > CmdJob: *NON-EMPTY*; "adminN", for some N
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: True
> > > Running Job State: 8 nightly jobs
> > > While State: True, branch 2
> > >
> > > And it goes back to cycling through the nightlies and "Botch"ing.
> > >
> > > But eventually an actual backup finishes, and a link job gets
> > > queued *to the front*. The while is now false:
> > >
> > > CmdQueue State: 1 link job, 7 nightly jobs
> > > CmdJob: "adminN"
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: *FALSE*
> > > Running Job State: 8 nightly jobs
> > > While State: *FALSE*, since isAdminJob is false and $CmdJob is not ""
> > >
> > > At this point, the CmdQueue will not get run until something
> > > changes, which is the correct behaviour when nightlies are
> > > running.
> > >
> > > Note that the link job that cause the while to go false does not,
> > > itself, actually get run.
> > >
> > > The nightlies continue to complete. Finishing a nightly DOES NOT
> > > chang CmdJob, so this state continues until every nightly job
> > > currently running has completed. Note that what's currently
> > > running is 7 of the original nightly jobs, and one from the second
> > > round.
> > >
> > > When there are 0 nightly jobs running, BackupPC considers the run
> > > of nightlies to be complete. At this point, it sets
> > > RunNightlyWhenIdle to 0 (i.e. "run normally") and sets CmdJob to
> > > "", so we have this:
> > >
> > > CmdQueue State: 7 nightly jobs
> > > CmdJob: ""
> > > RunNightlyWhenIdle: 0
> > > isAdminJob Matches First Job: *TRUE*
> > > Running Job State: *EMPTY*
> > > While State: *TRUE*, *BRANCH ONE* (CmdJob empty, and suddenly
> > > RunNightlyWhenIdle isn't 2 anymore)
> > >
> > > Since the while is true, it launches the last job on the queue, a
> > > nightly job, which leads to this state:
> > >
> > > CmdQueue State: 6 nightly jobs
> > > CmdJob: "adminN"
> > > RunNightlyWhenIdle: 0
> > > isAdminJob Matches First Job: True
> > > Running Job State: 1 nightly job
> > > While State: *FALSE*; RunNightlyWhenIdle isn't 2, so branch 2 can't
> > > be true, and CmdJob isn't "", so branch 1 can't be true
> > >
> > > No more CmdQueue checking, until the current nightly job ends. At
> > > that point, there are 0 nightly jobs running, which makes BackupPC
> > > thing everything is done, so it reports like everything is done,
> > > and sets CmdJob to "", which puts us back to one state above.
> > > Repeat until all nightly jobs are *actually* done, which leaves us
> > > in a proper state.
> > >
> > > The really bad part: *NO LINK JOBS RUN IN THIS STATE*.
> > >
> > > On my large hosts, each nightly job, run by itself, takes *ten or
> > > more hours*.
> > >
> > > This means that once we've entered this state, we've got *at least*
> > > 70 hours with no link jobs. A host will not run a new backup until
> > > the link from the last backup finishes. This means that it's very
> > > easy for hosts to become 2 or 3 days out of date.
> > >
> > > The daily nightlies run tries to fix it, but it doesn't actually
> > > work. When the daily nightly run kicks off, we end up with
> > > nightlies running, nightlies in the queue, and RunNightlyWhenIdle
> > > set to 2, which is the state we basically started in way above; it's
> > > only a matter of time before it gets stuck again. In my experience,
> > > in that case, it actually gets stuck again within a matter of
> > > seconds:
> > >
> > > 2010-11-25 01:00:00 Running BackupPC_nightly 224 255 (pid=25383)
> > > 2010-11-25 01:00:00 New BackupPCNightlyJobs counter: 2 ; new BackupPCNightlyLock counter: 2
> > > 2010-11-25 01:00:00 Running BackupPC_nightly -m 0 31 (pid=25384)
> > > 2010-11-25 01:00:00 New BackupPCNightlyJobs counter: 3 ; new BackupPCNightlyLock counter: 3
> > > 2010-11-25 01:00:00 Running BackupPC_nightly 32 63 (pid=25385)
> > > 2010-11-25 01:00:00 New BackupPCNightlyJobs counter: 4 ; new BackupPCNightlyLock counter: 4
> > > 2010-11-25 01:00:00 Running BackupPC_nightly 64 95 (pid=25386)
> > > 2010-11-25 01:00:00 New BackupPCNightlyJobs counter: 5 ; new BackupPCNightlyLock counter: 5
> > > 2010-11-25 01:00:00 Running BackupPC_nightly 96 127 (pid=25387)
> > > 2010-11-25 01:00:00 New BackupPCNightlyJobs counter: 6 ; new BackupPCNightlyLock counter: 6
> > > 2010-11-25 01:00:00 Running BackupPC_link mc00-s00106 (pid=25388)
> > > 2010-11-25 01:00:00 Running BackupPC_link mc00-s00009-prod-shared-redbubble (pid=25389)
> > > 2010-11-25 01:00:00 Running BackupPC_link mavenlink--tm25-e00053--tm25-s00106---db-backup (pid=25390)
> > > 2010-11-25 01:00:00 Running BackupPC_link provolvesolutions--tm25-e00080--tm25-s00157---data (pid=25391)
> > > 2010-11-25 01:00:00 Running BackupPC_link tm25-s00090__nfs (pid=25392)
> > > 2010-11-25 01:00:00 Running BackupPC_link provolvesolutions--tm25-e00079--tm25-s00155---db-backup (pid=25393)
> > > 2010-11-25 01:00:00 Running BackupPC_link mavenlink--tm25-e00054--tm25-s00107---data (pid=25394)
> > > 2010-11-25 01:00:00 Running BackupPC_link howcast--tm25-e00076--tm25-s00149---db-backup (pid=25395)
> > > 2010-11-25 01:00:00 Running BackupPC_link brighthouse--tm25-e00039--tm25-s00083---db-backup (pid=25396)
> > > 2010-11-25 01:00:00 Running BackupPC_link howcast--tm25-e00071--tm25-s00134---db-backup (pid=25397)
> > > 2010-11-25 01:00:00 Botch on admin job for admin : already in use!!
> > > 2010-11-25 01:00:00 Next wakeup is 2010-11-25 02:00:00
> > > 2010-11-25 01:00:00 Botch on admin job for admin1 : already in use!!
> > > 2010-11-25 01:00:01 Botch on admin job for admin2 : already in use!!
> > > 2010-11-25 01:00:02 Finished howcast--tm25-e00071--tm25-s00134---db-backup (BackupPC_link howcast--tm25-e00071--tm25-s00134---db-backup)
> > > 2010-11-25 01:00:02 Botch on admin job for admin3 : already in use!!
> > > 2010-11-25 01:00:07 Running BackupPC_nightly 128 159 (pid=25414)
> > > 2010-11-25 01:00:07 New BackupPCNightlyJobs counter: 7 ; new BackupPCNightlyLock counter: 7
> > >
> > > So *some of* the nightlies ran, and set RunNightlyWhenIdle to 2,
> > > which caused all the links to kick off for reasons explained above,
> > > leaving the queue with just the nightly jobs.
> > >
> > > The non-last links finish, leaving a link job at the front of the
> > > queue.
> > >
> > > Then a link finishes, leaving CmdJob empty.
> > >
> > > Then another nightly job runs, setting CmdJob, so we end up with:
> > >
> > > CmdQueue State: some number of nightly jobs
> > > CmdJob: "adminN"
> > > RunNightlyWhenIdle: 2
> > > isAdminJob Matches First Job: *FALSE*
> > > Running Job State: some number of nightly job
> > > While State: *FALSE*
> > >
> > > And round and round she goes. Total progress in 24 hours from
> > > BackupPC's point of view: *one link job*.
> > >
> > > It can keep doing this *forever*, as each 24-hour-ly run of
> > > nightlies restarts the problem, unless all the sequential nightlies
> > > can finish within 24 hours, which is *most definitely* not the case
> > > here.
> > >
> > > SOLUTION(S). TOTALLY UNTESTED.
> > >
> > > 1. Don't run a non-nightly job from the CmdQueue when there are
> > > nightly job running, *EVER*. (my copy is locally modified with
> > > debugging stuff to try to figure out these issues, so if this diff
> > > doesn't patch, that's why, sorry).
> > >
> > > --- /var/tmp/BackupPC 2010-11-25 01:52:30.000000000 +0000
> > > +++ /usr/local/bin/BackupPC 2010-11-25 02:00:03.000000000 +0000
> > > @@ -522,6 +522,15 @@
> > > $req = pop(@CmdQueue);
> > >
> > > $host = $req->{host};
> > > +
> > > + if( $BackupPCNightlyJobs > 0 && ! $bpc->isAdminJob($host) )
> > > + {
> > > + print(LOG $bpc->timeStamp, "Tried to run ".$req->{cmd}." on $host when there are nightlies running. That's bad.\n");
> > > +
> > > + unshift(@CmdQueue, $req);
> > > + return;
> > > + }
> > > +
> > > if ( defined($Jobs{$host}) ) {
> > > print(LOG $bpc->timeStamp,
> > > "Botch on admin job for $host: already in use!!\n");
> > >
> > >
> > > 2. That means that if nightlies are running, and you do a
> > > "BackupPC_nightly run", it'll queue them, 8 at a time total, behind
> > > the previous ones, which means the whole run will take about twice
> > > as long as it would have otherwise. I don't know about you, but
> > > that's certainly not what *I* expect when I launch a nightly run; I
> > > expect all the old ones to die in favour of new ones:
> > >
> > > @@ -1362,7 +1371,19 @@
> > > } elsif ( $cmd =~ /^backup all$/ ) {
> > > QueueAllPCs();
> > > } elsif ( $cmd =~ /^BackupPC_nightly run$/ ) {
> > > + print(LOG $bpc->timeStamp,
> > > + "Running nightlies at user request.\n" );
> > > + foreach my $host (keys %Jobs) {
> > > + if( $bpc->isAdminJob( $host ) ) {
> > > + my $pid = $Jobs{$host}{pid};
> > > + kill($bpc->sigName2num("INT"), $pid);
> > > + print(LOG $bpc->timeStamp,
> > > + "Killing nightly job $host with PID $pid to make way for manual run.\n");
> > > + }
> > > + }
> > > $RunNightlyWhenIdle = 1;
> > > + $BackupPCNightlyJobs = 0;
> > > + $BackupPCNightlyLock = 0;
> > > } elsif ( $cmd =~ /^backup (\S+)\s+(\S+)\s+(\S+)\s+(\S+)/ ) {
> > > my $hostIP = $1;
> > > $host = $2;
> > >
> > > -Robin
> > >
> > > --
> > > http://singinst.org/ : Our last, best hope for a fantastic future.
> > > Lojban (http://www.lojban.org/): The language in which "this parrot
> > > is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
> > > is "na nei". My personal page: http://www.digitalkingdom.org/rlp/
> > >
> > > ------------------------------------------------------------------------------
> > > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
> > > Tap into the largest installed PC base & get more eyes on your game by
> > > optimizing for Intel(R) Graphics Technology. Get started today with the
> > > Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
> > > http://p.sf.net/sfu/intelisp-dev2dev
> > > _______________________________________________
> > > BackupPC-users mailing list
> > > BackupPC-users AT lists.sourceforge DOT net
> > > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> > > Wiki: http://backuppc.wiki.sourceforge.net
> > > Project: http://backuppc.sourceforge.net/
> >
> > --
> > http://singinst.org/ : Our last, best hope for a fantastic future.
> > Lojban (http://www.lojban.org/): The language in which "this parrot
> > is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
> > is "na nei". My personal page: http://www.digitalkingdom.org/rlp/
> >
> > ------------------------------------------------------------------------------
> > Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> > user administration capabilities and model configuration. Take
> > the hassle out of deploying and managing Subversion and the
> > tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> > _______________________________________________
> > BackupPC-users mailing list
> > BackupPC-users AT lists.sourceforge DOT net
> > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> > Wiki: http://backuppc.wiki.sourceforge.net
> > Project: http://backuppc.sourceforge.net/
> >
>
> --
> http://singinst.org/ : Our last, best hope for a fantastic future.
> Lojban (http://www.lojban.org/): The language in which "this parrot
> is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
> is "na nei". My personal page: http://www.digitalkingdom.org/rlp/
>
> ------------------------------------------------------------------------------
> Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> user administration capabilities and model configuration. Take
> the hassle out of deploying and managing Subversion and the
> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users AT lists.sourceforge DOT net
> List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki: http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/