Bacula-users

Re: [Bacula-users] bacula watchdog killing tape-loading

2008-08-07 08:03:09
Subject: Re: [Bacula-users] bacula watchdog killing tape-loading
From: Nils Blanck-Wehde <nils.blanck-wehde AT backofficeservice DOT biz>
Date: Thu, 07 Aug 2008 14:02:42 +0200
Hi Terry,

sorry, I forgot to mention: OS is CentOS 5.2, bacula-version is 2.4.2. Problems started all out of a sudden with 2.4.1 after running fine for weeks.

Calling mtx and mtx-changer both as root as well as user bacula works flawlessly (slot 1 is currently loaded):

[root@company-Backupserver ~]# mtx -f /dev/sg3 load 1 0
Drive 0 Full (Storage Element 1 loaded)
[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/sbin/mtx -f /dev/sg3 load 1 0
Drive 0 Full (Storage Element 1 loaded)

[root@company-Backupserver ~]# /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
1
[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
1
-bash-3.2$


I can see from the webadmin of the autochanger (Quantum Superloader 3 DLT) that autochanger commands are being executed correctly, still bacula reports "ERR=Child died from signal 15: Termination".

Nils


T. Horsnell schrieb:
Assuming your O/S is a Unix/Linux of some sort, have you tried the basic mtx command on it?
Something like

mtx -f /dev/sg3 load 1 0

(See 'man mtx')

Cheers,
Terry


  Hi all,
bacula won't work with our autochanger anymore. I can't find the source of the problems.
Here is the output of the autochanger-test of: "btape -c /etc/bacula/bacula-sd.conf /dev/nst0":

=== Autochanger test ===

3301 Issuing autochanger "loaded" command.
Slot 1 loaded. I am going to unload it.
3302 Issuing autochanger "unload 1 0" command.
unload status=Bad 134217743
3992 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3 unload 1 /dev/nst0 0
3992 result="Unloading drive 0 into Storage Element 1...done
Program killed by Bacula watchdog (timeout)
": ERR=Child died from signal 15: Termination
3303 Issuing autochanger "load 1 0" command.
3993 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3 load 1 /dev/nst0 0
3993 result="Loading media from Storage Element 1 into drive 0...done
Program killed by Bacula watchdog (timeout)
": ERR=Child died from signal 15: Termination
You must correct this error or the Autochanger will not work.

This is the storage-definition:

Autochanger {
  Name = QS3DLT
  Device = DLT-Drive-1
  Changer Command = "/usr/lib/bacula/mtx-changer %c %o %S %a %d"
  Changer Device = /dev/sg3
}

Device {
  Name = DLT-Drive-1                      #
  Drive Index = 0
  Media Type = DLT-VS1
  Archive Device = /dev/nst0
  AutomaticMount = yes;               # when device opened, read it
  AlwaysOpen = no;
  RemovableMedia = yes;
  RandomAccess = no;
  AutoChanger = yes
  Maximum Changer Wait = 10
  Maximum Rewind Wait = 10
  Maximum Open Wait = 10
}

When I look at the webadmin of the autochanger I see the autochanger and the drive perform exactly the requested operations at the usual speed (~1:30min for an unload operation, ~3:50 for an unload/load operation).

Still I get lots of killing / timeout problems.

I start to wonder if the autochanger is somewhat defective...

If any of you guys can help I would greatly appreciate it.

All the best, Nils



Nils Blanck-Wehde schrieb:

Hi John,

thanks for your help. I am quite new to bacula and it seems to take some time to fully understand it :-)
I am not sure whether it really is a timeout problem.
I increased all timeout values to 10 minutes and the killing still occurs (after 2:20min):

07-Aug 12:20 company_bacula-dir JobId 217: Start Backup JobId 217, Job=Fileserver_Lexware_Exchange_to_Tape.2008-08-07_12.20.03
07-Aug 12:20 company_bacula-dir JobId 217: Using Device "DLT-Drive-1"
07-Aug 12:20 company_bacula-sd JobId 217: 3301 Issuing autochanger "loaded? drive 0" command.
07-Aug 12:20 company_bacula-sd JobId 217: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
07-Aug 12:20 company_bacula-sd JobId 217: 3304 Issuing autochanger "load slot 8, drive 0" command.
*messages
07-Aug 12:22 company_bacula-sd JobId 217: Fatal error: 3992 Bad autochanger "load slot 8, drive 0": ERR=Child died from signal 15: Termination.
Results=Loading media from Storage Element 8 into drive 0...done
Program killed by Bacula watchdog (timeout)

07-Aug 12:20 company-appsrv-fd JobId 217: Fatal error: ../../filed/job.c:1817 Bad response to Append Data command. Wanted 3000 OK data, got 3903 Error append data


I doublechecked the time needed by the autochanger for an unload/load operation: I did a couple of operations and they all terminated in less than four minutes.
I think there is a general problem with the autochanger because btape test throws the following error when testing the autochanger:

3301 Issuing autochanger "loaded" command.
3991 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
3991 result="": ERR=Child died from signal 15: Termination
You must correct this error or the Autochanger will not work.

When I run this command "/usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0" manually, both as root and as user bacula, it returns "1":

[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
1
-bash-3.2$

Could it be a permission problem? What Uid Gid should mtx-changer have? Could some automatic mechanism on my CentOS installation have changed the permissions of the autochanger device to a lower level?

I am a little stuck here.

Thanks for all help!

Nils


John Drescher schrieb:

On Wed, Aug 6, 2008 at 1:42 PM, Nils Blanck-Wehde
<nils.blanck-wehde AT backofficeservice DOT biz> <mailto:nils.blanck-wehde AT backofficeservice DOT biz> wrote:
 

Hello list,

I just encountered this issue:

06-Aug 19:31 company_bacula-sd JobId 210: Fatal error: 3992 Bad
autochanger "load slot 8, drive 0": ERR=Child died from signal 15:
Termination.
Results=Loading media from Storage Element 8 into drive 0...done
Program killed by Bacula watchdog (timeout)


Earlier today I got this message:

06-Aug 17:10 company_bacula-dir JobId 208: Using Device "DLT-Drive-1"
06-Aug 17:10 company_bacula-sd JobId 208: 3301 Issuing autochanger "loaded? drive 0" command.
06-Aug 17:10 company_bacula-sd JobId 208: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
06-Aug 17:10 company_bacula-sd JobId 208: 3304 Issuing autochanger "load slot 1, drive 0" command.
06-Aug 17:13 company-appsrv-fd JobId 208: Fatal error: ../../filed/job.c:1817 Bad response to Append Data command. Wanted 3000 OK data
, got 3903 Error append data

06-Aug 17:15 company_bacula-sd JobId 208: Fatal error: 3992 Bad autochanger "load slot 1, drive 0": ERR=Child died from signal 15: Termination.
Results=Loading media from Storage Element 1 into drive 0...done
Program killed by Bacula watchdog (timeout)

  
The default maximum changer wait is 5 minutes. If the changer does not
complete in 5 minutes bacula will kill the mtx-changer script.

See Maximum Changer Wait in

http://bacula.org/en/rel-manual/Storage_Daemon_Configuratio.html

John




 


-- 

 

*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT -

 

*Anschrift: *

Niederkastenholzer Str. 40

53881 Euskirchen

    

*Telefon:   *+49 2255 953204* *

*Fax:          *+49 2255 953208

*Mobil:      *+49 177 3397547

    

*Bankverbindung: *

Raiffeisenbank Rheinbach Voreifel eG

Kto. Nr. 340286014 (BLZ 370 696 27)

    

*Online: *

info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de>

www.backofficeservice.biz <http://www.backofficeservice.biz>

 




-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=""> <http://moblin-contest.org/redirect.php?banner_id=100&url="">

!DSPAM:489ad34a16761048915462!
 


_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net <mailto:Bacula-users AT lists.sourceforge DOT net>
https://lists.sourceforge.net/lists/listinfo/bacula-users


!DSPAM:489ad34a16761048915462!
 


-- 

 

*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT -

 

*Anschrift: *

Niederkastenholzer Str. 40

53881 Euskirchen

    

*Telefon:   *+49 2255 953204* *

*Fax:          *+49 2255 953208

*Mobil:      *+49 177 3397547

    

*Bankverbindung: *

Raiffeisenbank Rheinbach Voreifel eG

Kto. Nr. 340286014 (BLZ 370 696 27)

    

*Online: *

info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de>

www.backofficeservice.biz <http://www.backofficeservice.biz>

 


------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url="">


------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


!DSPAM:489add52223816216930368!


--

 

BackOfficeService - Beratung und Service für Ihre IT -

 

Anschrift:

Niederkastenholzer Str. 40

53881 Euskirchen

Telefon:   +49 2255 953204

Fax:          +49 2255 953208

Mobil:      +49 177 3397547

Bankverbindung:

Raiffeisenbank Rheinbach Voreifel eG

Kto. Nr. 340286014 (BLZ 370 696 27)

Online:

info@backofficeservice.biz

www.backofficeservice.biz

 

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users