Bacula-users

Re: [Bacula-users] bacula watchdog killing tape-loading

2008-08-07 07:32:43
Subject: Re: [Bacula-users] bacula watchdog killing tape-loading
From: "T. Horsnell" <tsh AT mrc-lmb.cam.ac DOT uk>
To: Nils Blanck-Wehde <nils.blanck-wehde AT backofficeservice DOT biz>
Date: Thu, 07 Aug 2008 12:45:27 +0100
Assuming your O/S is a Unix/Linux of some sort, have you tried the basic 
mtx command on it?
Something like

mtx -f /dev/sg3 load 1 0

(See 'man mtx')

Cheers,
Terry


>   Hi all,
> bacula won't work with our autochanger anymore. I can't find the source 
> of the problems.
> Here is the output of the autochanger-test of: "btape -c 
> /etc/bacula/bacula-sd.conf /dev/nst0":
> 
> === Autochanger test ===
> 
> 3301 Issuing autochanger "loaded" command.
> Slot 1 loaded. I am going to unload it.
> 3302 Issuing autochanger "unload 1 0" command.
> unload status=Bad 134217743
> 3992 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3 
> unload 1 /dev/nst0 0
> 3992 result="Unloading drive 0 into Storage Element 1...done
> Program killed by Bacula watchdog (timeout)
> ": ERR=Child died from signal 15: Termination
> 3303 Issuing autochanger "load 1 0" command.
> 3993 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3 load 
> 1 /dev/nst0 0
> 3993 result="Loading media from Storage Element 1 into drive 0...done
> Program killed by Bacula watchdog (timeout)
> ": ERR=Child died from signal 15: Termination
> You must correct this error or the Autochanger will not work.
> 
> This is the storage-definition:
> 
> Autochanger {
>   Name = QS3DLT
>   Device = DLT-Drive-1
>   Changer Command = "/usr/lib/bacula/mtx-changer %c %o %S %a %d"
>   Changer Device = /dev/sg3
> }
> 
> Device {
>   Name = DLT-Drive-1                      #
>   Drive Index = 0
>   Media Type = DLT-VS1
>   Archive Device = /dev/nst0
>   AutomaticMount = yes;               # when device opened, read it
>   AlwaysOpen = no;
>   RemovableMedia = yes;
>   RandomAccess = no;
>   AutoChanger = yes
>   Maximum Changer Wait = 10
>   Maximum Rewind Wait = 10
>   Maximum Open Wait = 10
> }
> 
> When I look at the webadmin of the autochanger I see the autochanger and 
> the drive perform exactly the requested operations at the usual speed 
> (~1:30min for an unload operation, ~3:50 for an unload/load operation).
> 
> Still I get lots of killing / timeout problems.
> 
> I start to wonder if the autochanger is somewhat defective...
> 
> If any of you guys can help I would greatly appreciate it.
> 
> All the best, Nils
> 
> 
> 
> Nils Blanck-Wehde schrieb:
> 
>> Hi John,
>>
>> thanks for your help. I am quite new to bacula and it seems to take 
>> some time to fully understand it :-)
>> I am not sure whether it really is a timeout problem.
>> I increased all timeout values to 10 minutes and the killing still 
>> occurs (after 2:20min):
>>
>> 07-Aug 12:20 company_bacula-dir JobId 217: Start Backup JobId 217, 
>> Job=Fileserver_Lexware_Exchange_to_Tape.2008-08-07_12.20.03
>> 07-Aug 12:20 company_bacula-dir JobId 217: Using Device "DLT-Drive-1"
>> 07-Aug 12:20 company_bacula-sd JobId 217: 3301 Issuing autochanger 
>> "loaded? drive 0" command.
>> 07-Aug 12:20 company_bacula-sd JobId 217: 3302 Autochanger "loaded? 
>> drive 0", result: nothing loaded.
>> 07-Aug 12:20 company_bacula-sd JobId 217: 3304 Issuing autochanger 
>> "load slot 8, drive 0" command.
>> *messages
>> 07-Aug 12:22 company_bacula-sd JobId 217: Fatal error: 3992 Bad 
>> autochanger "load slot 8, drive 0": ERR=Child died from signal 15: 
>> Termination.
>> Results=Loading media from Storage Element 8 into drive 0...done
>> Program killed by Bacula watchdog (timeout)
>>
>> 07-Aug 12:20 company-appsrv-fd JobId 217: Fatal error: 
>> ../../filed/job.c:1817 Bad response to Append Data command. Wanted 
>> 3000 OK data, got 3903 Error append data
>>
>>
>> I doublechecked the time needed by the autochanger for an unload/load 
>> operation: I did a couple of operations and they all terminated in 
>> less than four minutes.
>> I think there is a general problem with the autochanger because btape 
>> test throws the following error when testing the autochanger:
>>
>> 3301 Issuing autochanger "loaded" command.
>> 3991 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3 
>> loaded 1 /dev/nst0 0
>> 3991 result="": ERR=Child died from signal 15: Termination
>> You must correct this error or the Autochanger will not work.
>>
>> When I run this command "/usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 
>> /dev/nst0 0" manually, both as root and as user bacula, it returns "1":
>>
>> [root@company-Backupserver ~]# su - bacula
>> -bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
>> 1
>> -bash-3.2$
>>
>> Could it be a permission problem? What Uid Gid should mtx-changer 
>> have? Could some automatic mechanism on my CentOS installation have 
>> changed the permissions of the autochanger device to a lower level?
>>
>> I am a little stuck here.
>>
>> Thanks for all help!
>>
>> Nils
>>
>>
>> John Drescher schrieb:
>>
>>>On Wed, Aug 6, 2008 at 1:42 PM, Nils Blanck-Wehde
>>><nils.blanck-wehde AT backofficeservice DOT biz> <mailto:nils.blanck-wehde 
>>>AT backofficeservice DOT biz> wrote:
>>>  
>>>
>>>>Hello list,
>>>>
>>>>I just encountered this issue:
>>>>
>>>>06-Aug 19:31 company_bacula-sd JobId 210: Fatal error: 3992 Bad
>>>>autochanger "load slot 8, drive 0": ERR=Child died from signal 15:
>>>>Termination.
>>>>Results=Loading media from Storage Element 8 into drive 0...done
>>>>Program killed by Bacula watchdog (timeout)
>>>>
>>>>
>>>>Earlier today I got this message:
>>>>
>>>>06-Aug 17:10 company_bacula-dir JobId 208: Using Device "DLT-Drive-1"
>>>>06-Aug 17:10 company_bacula-sd JobId 208: 3301 Issuing autochanger "loaded? 
>>>>drive 0" command.
>>>>06-Aug 17:10 company_bacula-sd JobId 208: 3302 Autochanger "loaded? drive 
>>>>0", result: nothing loaded.
>>>>06-Aug 17:10 company_bacula-sd JobId 208: 3304 Issuing autochanger "load 
>>>>slot 1, drive 0" command.
>>>>06-Aug 17:13 company-appsrv-fd JobId 208: Fatal error: 
>>>>../../filed/job.c:1817 Bad response to Append Data command. Wanted 3000 OK 
>>>>data
>>>>, got 3903 Error append data
>>>>
>>>>06-Aug 17:15 company_bacula-sd JobId 208: Fatal error: 3992 Bad autochanger 
>>>>"load slot 1, drive 0": ERR=Child died from signal 15: Termination.
>>>>Results=Loading media from Storage Element 1 into drive 0...done
>>>>Program killed by Bacula watchdog (timeout)
>>>>
>>>>    
>>>>
>>>The default maximum changer wait is 5 minutes. If the changer does not
>>>complete in 5 minutes bacula will kill the mtx-changer script.
>>>
>>>See Maximum Changer Wait in
>>>
>>>http://bacula.org/en/rel-manual/Storage_Daemon_Configuratio.html
>>>
>>>John
>>>
>>>
>>>
>>>
>>>  
>>>
>>
>> -- 
>>
>>  
>>
>> *B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT -
>>
>>  
>>
>> *Anschrift: *
>>
>> Niederkastenholzer Str. 40
>>
>> 53881 Euskirchen
>>
>>      
>>
>> *Telefon:   *+49 2255 953204* *
>>
>> *Fax:          *+49 2255 953208
>>
>> *Mobil:      *+49 177 3397547
>>
>>      
>>
>> *Bankverbindung: *
>>
>> Raiffeisenbank Rheinbach Voreifel eG
>>
>> Kto. Nr. 340286014 (BLZ 370 696 27)
>>
>>      
>>
>> *Online: *
>>
>> info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de>
>>
>> www.backofficeservice.biz <http://www.backofficeservice.biz>
>>
>>  
>>
>> !DSPAM:489ad34a16761048915462!
>>
>> 
>>-------------------------------------------------------------------------
>>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>>Build the coolest Linux based applications with Moblin SDK & win great prizes
>>Grand prize is a trip for two to an Open Source event anywhere in the world
>>http://moblin-contest.org/redirect.php?banner_id=100&url=/ 
>><http://moblin-contest.org/redirect.php?banner_id=100&url=/>
>>
>>!DSPAM:489ad34a16761048915462!
>>  
>>
>> 
>>_______________________________________________
>>Bacula-users mailing list
>>Bacula-users AT lists.sourceforge DOT net <mailto:Bacula-users AT 
>>lists.sourceforge DOT net>
>>https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>>!DSPAM:489ad34a16761048915462!
>>  
>>
> 
> -- 
> 
>  
> 
> *B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT -
> 
>  
> 
> *Anschrift: *
> 
> Niederkastenholzer Str. 40
> 
> 53881 Euskirchen
> 
>       
> 
> *Telefon:   *+49 2255 953204* *
> 
> *Fax:          *+49 2255 953208
> 
> *Mobil:      *+49 177 3397547
> 
>       
> 
> *Bankverbindung: *
> 
> Raiffeisenbank Rheinbach Voreifel eG
> 
> Kto. Nr. 340286014 (BLZ 370 696 27)
> 
>       
> 
> *Online: *
> 
> info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de>
> 
> www.backofficeservice.biz <http://www.backofficeservice.biz>
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users