Bacula-users

Re: [Bacula-users] jobs hang and i can't cancel them

2008-09-19 03:09:39
Subject: Re: [Bacula-users] jobs hang and i can't cancel them
From: Alexandru Ionica <gremlin AT networked DOT ro>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 19 Sep 2008 10:10:03 +0300
Alexandru Ionica wrote:
> Hello,
> I'm running Bacula 2.4.0 on Debian Etch with disk storage.
> The problem started a week ago when using bscan a "imported" catalog
> information for a volume (i had File Retention = 30 days and i wanted to
> recover only some files as the volume is big for a full restore =~ 20 Gb
> and ~ 80000 files ).
> I don't know if it's related or not because i have problems with jobs
> for other clients which were not backed up on that volume.
> The problem is jobs hang and i can't even cancel them.
> Example:
> 
> The begining of the output from "status dir" is below, i limited output
> because there were alot of jobs waiting and also changed names to
> protect the innocent.
> 
> Running Jobs:
> Console connected at 18-Sep-08 11:12
>  JobId Level   Name                       Status
> ======================================================================
>   2320 Increme  first-server-custom_home.2008-09-19_00.15.41 is waiting
> for Client first-server-fd to connect to Storage File1
>   2343 Increme  second-server.2008-09-19_02.05.27 is waiting for Client
> second-server-fd to connect to Storage File2
>   2344 Increme  third-server.2008-09-19_02.05.28 is waiting on max
> Storage jobs
> 
> 
> Problem is that if i run
> 
> *cancel jobid=2320
> 2901 Job first-server-custom_home.2008-09-19_00.15.41 not found.
> 3904 Job first-server-custom_home.2008-09-19_00.15.41 not found.
> *cancel jobid=2343
> 2901 Job second-server.2008-09-19_02.05.27 not found.
> 3904 Job second-server.2008-09-19_02.05.27 not found.
> *cancel jobid=2344
> 
> Job #2344 got canceled but the other two didn't. Problem is that job
> #2320 seem not to even exist at "list jobs"
> 
> *list jobid=2320
> No results to list.
> *list jobid=2343
> +-------+-------------------+---------------------+------+-------+----------+----------+-----------+
> | JobId | Name              | StartTime           | Type | Level |
> JobFiles | JobBytes | JobStatus |
> +-------+-------------------+---------------------+------+-------+----------+----------+-----------+
> | 2,343 | second-server | 2008-09-19 02:08:55 | B    | I     |        0
> |        0 | R         |
> +-------+-------------------+---------------------+------+-------+----------+----------+-----------+
> 
> Any idea ? The thing is that this setup worked for more then a month
> (during which i added and still i'm adding clients to be backed up).
> Also manually starting backups for the clients above is working and they
> get backed up.
> 
In the mean time i ran myisqmchk -o and there weren't any issues reported. Also
at least for first-server and second-second server i found that since two days
ago they have problems with resolving names (we had a dns change)  so this could
explain at least why the backup didn't work. My question is why is there the
weird behaveour with jobs meaning the job that was missing at "list jobs" but
appear at "status dir" and also why can't i cancel them. I'm sure i had problems
since before two days ago (when there was the dns change) but until it happens
again i'm going to consider them as being related to poor dns working (because
of our internal conditions).


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>