Bacula-users

Re: [Bacula-users] "Program killed by Bacula watchdog (timeout)" errors

2009-01-11 18:01:28
Subject: Re: [Bacula-users] "Program killed by Bacula watchdog (timeout)" errors
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Sun, 11 Jan 2009 23:59:13 +0100
Hi,

11.01.2009 22:06, Wolfgang Denk wrote:
> Hello,
> 
> I'm running a combo of bacula 2.4.2 / 2.4.4 setups, and  now  I  have
> some jobs failing with ""Program killed by Bacula watchdog (timeout)"
> errors like here:
> 
> 11-Jan 00:05 mneme-dir JobId 15630: Start Backup JobId 15630, 
> Job=mneme.denx.de.2009-01-11_00.05.00.47
> 11-Jan 00:05 mneme-sd JobId 15630: 3307 Issuing autochanger "unload slot 2, 
> drive 0" command.
> 11-Jan 00:07 mneme-dir JobId 15630: Using Device "DDS-3"
> 11-Jan 00:07 mneme-sd JobId 15630: 3301 Issuing autochanger "loaded? drive 0" 
> command.
> 11-Jan 00:07 mneme-sd JobId 15630: 3302 Autochanger "loaded? drive 0", 
> result: nothing loaded.
> 11-Jan 00:07 mneme-sd JobId 15630: 3304 Issuing autochanger "load slot 5, 
> drive 0" command.
> 11-Jan 00:12 mneme-sd JobId 15630: Fatal error: 3992 Bad autochanger "load 
> slot 5, drive 0": ERR=Child died from signal 15: Termination.
> Results=Loading media from Storage Element 5 into drive 0...done
> Program killed by Bacula watchdog (timeout)

Looks like the mtx-changer script takes too long...

> 11-Jan 00:12 mneme-fd JobId 15630: Fatal error: job.c:1817 Bad response to 
> Append Data command. Wanted 3000 OK data
> , got 3903 Error append data
> 
> 
> I don't nderstand what is happening, and why. If I check after the
> failure, the tape in question is correctly loaded.

The autochanger might be stuck, or rather, mtx doesn't return but 
waits indefinitely.

Don't ask me for reasons, though...

> And we never had
> any such failures before (i.e. when running bacula 2.2.x versions).
> 
> I cannot find any (configurable?) watchdog timeout mentioned in the
> documentation - am I missing something?

Storage daemon configuration, Device Resource, "Maximum Changer Wait" 
directive (iirc, this takes seconds, without qualifier, as an 
argument, not something like "10min"!).

Also, you might run the mtx-changer script with debug output to a file 
and observe what it does, and were it stops doing things.

Hope this helps,

Arno

> Best regards,
> 
> Wolfgang Denk
> 

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>