> <mailto:
novosirj AT umdnj DOT edu>> wrote:
>
> Mingus Dew wrote:
>> Hi all,
>> Been using Bacula 2.4.2 on Solaris 10_x86 for almost 2 years now.
>> Recently tape backups have been entering into a state that I can only
>> describe as "limbo".
>
>> If I check the status of the director, I may see something like
>
>> Running Jobs:
>> JobId Level Name Status
>> ======================================================================
>> 22649 Increme RMAN_A_Lvl1_Tape.2009-02-17_13.30.36 is running
>> 22650 Increme RMAN_B_Lvl1_Tape.2009-02-17_13.30.38 is waiting on max
>> Storage jobs
>> 22651 Increme RMAN_PROD_Lvl1_Tape.2009-02-17_14.00.40 is waiting on
>> max Storage jobs
>> 22652 Increme RMAN_BI_Lvl1_Tape.2009-02-17_14.00.42 is waiting
> on max
>> Storage jobs
>> 22653 Increme RMAN_COG_Lvl1_Tape.2009-02-17_14.00.44 is waiting
> on max
>> Storage jobs
>
>> If I check the status of the running jobid or the tape device, it will
>> show this:
>
>> Used Volume status:
>> B00046 on device "Ultrium-TD3" (/dev/rmt/0cbn)
>> Reader=0 writers=0 devres=0 volinuse=1
>> ====
>
>> Data spooling: 0 active jobs, 0 bytes; 80 total jobs,
> 47,799,329,608 max
>> bytes/job.
>> Attr spooling: 0 active jobs, 0 bytes; 80 total jobs, 40,616 max
> bytes.
>
>> Basically, tape is mounted and reserved, job is showing a "is running"
>> status, but nothing is happening. Because I lack any monitoring of how
>> long jobs have been running,
>> these have sat for as many as 3 days without changing status,
> erroring,
>> or completing. This backs up subsequent jobs that have been
> waiting for
>> the tape device.
>> The only commonality that I've seen is that they are tape jobs. Other
>> than that, the level, fileset, etc. are different.
>
>> On one occasion when I cancelled one of these long running jobs, I got
>> an error
>
>> Hostname : BUG!
>> Date : 2009-02-11 14:00:30
>> Severity : err
>
>> unregister_watchdog_unlocked called before start_watchdog
>
>
>> Hostname : BUG!
>> Date : 2009-02-11 14:00:30
>> Severity : err
>
>> bacula-dir[20200]: [ID 702911 daemon.error] backup4.director: ABORTING
>> due to ERROR in watchdog.c:206
>
>> If anyone has any advice on what might be happening, I would really
>> appreciate your responses.
>
> Check to see what, if anything, your backend database is doing. You
> don't tell us what it is, so I can't be any more specific.
>
-