Bacula-users

Re: [Bacula-users] bacula watchdog killing tape-loading

2008-08-07 13:43:41
Subject: Re: [Bacula-users] bacula watchdog killing tape-loading
From: "T. Horsnell" <tsh AT mrc-lmb.cam.ac DOT uk>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 07 Aug 2008 18:56:27 +0100
So, clutching at straws, is it definitely the case that you have made 
*no* changes to the system since things last worked properly? No 
upgrades, no messing with selinux, no messing with mysql (assuming that 
is your db system) no messing with firewalls etc. And do you still have 
space on the filesystem which contains Bacula's workspace and catalogue? 
At one point in your stuff below, you got:

 >>>>3302 Issuing autochanger "unload 1 0" command.
 >>>>unload status=Bad 134217743

I wonder what the status of 134217743 means....

Cheers,
Terry.

> What groups is the bacula user in and what are the perms on /dev/sg3?
>                       Thanks -Jason
> 
> On Thu, 2008-08-07 at 14:02 +0200, Nils Blanck-Wehde wrote:
> 
>>Hi Terry,
>>
>>sorry, I forgot to mention: OS is CentOS 5.2, bacula-version is 2.4.2.
>>Problems started all out of a sudden with 2.4.1 after running fine for
>>weeks.
>>
>>Calling mtx and mtx-changer both as root as well as user bacula works
>>flawlessly (slot 1 is currently loaded):
>>
>>[root@company-Backupserver ~]# mtx -f /dev/sg3 load 1 0
>>Drive 0 Full (Storage Element 1 loaded)
>>[root@company-Backupserver ~]# su - bacula
>>-bash-3.2$ /usr/sbin/mtx -f /dev/sg3 load 1 0
>>Drive 0 Full (Storage Element 1 loaded)
>>
>>[root@company-Backupserver ~]# /usr/lib/bacula/mtx-changer /dev/sg3
>>loaded 1 /dev/nst0 0
>>1
>>[root@company-Backupserver ~]# su - bacula
>>-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
>>1
>>-bash-3.2$
>>
>>I can see from the webadmin of the autochanger (Quantum Superloader 3
>>DLT) that autochanger commands are being executed correctly, still
>>bacula reports "ERR=Child died from signal 15: Termination".
>>
>>Nils
>>
>>
>>T. Horsnell schrieb: 
>>
>>>Assuming your O/S is a Unix/Linux of some sort, have you tried the
>>>basic mtx command on it? 
>>>Something like 
>>>
>>>mtx -f /dev/sg3 load 1 0 
>>>
>>>(See 'man mtx') 
>>>
>>>Cheers, 
>>>Terry 
>>>
>>>
>>>
>>>>  Hi all, 
>>>>bacula won't work with our autochanger anymore. I can't find the
>>>>source of the problems. 
>>>>Here is the output of the autochanger-test of: "btape
>>>>-c /etc/bacula/bacula-sd.conf /dev/nst0": 
>>>>
>>>>=== Autochanger test === 
>>>>
>>>>3301 Issuing autochanger "loaded" command. 
>>>>Slot 1 loaded. I am going to unload it. 
>>>>3302 Issuing autochanger "unload 1 0" command. 
>>>>unload status=Bad 134217743 
>>>>3992 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
>>>>unload 1 /dev/nst0 0 
>>>>3992 result="Unloading drive 0 into Storage Element 1...done 
>>>>Program killed by Bacula watchdog (timeout) 
>>>>": ERR=Child died from signal 15: Termination 
>>>>3303 Issuing autochanger "load 1 0" command. 
>>>>3993 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
>>>>load 1 /dev/nst0 0 
>>>>3993 result="Loading media from Storage Element 1 into drive
>>>>0...done 
>>>>Program killed by Bacula watchdog (timeout) 
>>>>": ERR=Child died from signal 15: Termination 
>>>>You must correct this error or the Autochanger will not work. 
>>>>
>>>>This is the storage-definition: 
>>>>
>>>>Autochanger { 
>>>>  Name = QS3DLT 
>>>>  Device = DLT-Drive-1 
>>>>  Changer Command = "/usr/lib/bacula/mtx-changer %c %o %S %a %d" 
>>>>  Changer Device = /dev/sg3 
>>>>} 
>>>>
>>>>Device { 
>>>>  Name = DLT-Drive-1                      # 
>>>>  Drive Index = 0 
>>>>  Media Type = DLT-VS1 
>>>>  Archive Device = /dev/nst0 
>>>>  AutomaticMount = yes;               # when device opened, read
>>>>it 
>>>>  AlwaysOpen = no; 
>>>>  RemovableMedia = yes; 
>>>>  RandomAccess = no; 
>>>>  AutoChanger = yes 
>>>>  Maximum Changer Wait = 10 
>>>>  Maximum Rewind Wait = 10 
>>>>  Maximum Open Wait = 10 
>>>>} 
>>>>
>>>>When I look at the webadmin of the autochanger I see the
>>>>autochanger and the drive perform exactly the requested operations
>>>>at the usual speed (~1:30min for an unload operation, ~3:50 for an
>>>>unload/load operation). 
>>>>
>>>>Still I get lots of killing / timeout problems. 
>>>>
>>>>I start to wonder if the autochanger is somewhat defective... 
>>>>
>>>>If any of you guys can help I would greatly appreciate it. 
>>>>
>>>>All the best, Nils 
>>>>
>>>>
>>>>
>>>>Nils Blanck-Wehde schrieb: 
>>>>
>>>>
>>>>>Hi John, 
>>>>>
>>>>>thanks for your help. I am quite new to bacula and it seems to
>>>>>take some time to fully understand it :-) 
>>>>>I am not sure whether it really is a timeout problem. 
>>>>>I increased all timeout values to 10 minutes and the killing
>>>>>still occurs (after 2:20min): 
>>>>>
>>>>>07-Aug 12:20 company_bacula-dir JobId 217: Start Backup JobId
>>>>>217,
>>>>>Job=Fileserver_Lexware_Exchange_to_Tape.2008-08-07_12.20.03 
>>>>>07-Aug 12:20 company_bacula-dir JobId 217: Using Device
>>>>>"DLT-Drive-1" 
>>>>>07-Aug 12:20 company_bacula-sd JobId 217: 3301 Issuing
>>>>>autochanger "loaded? drive 0" command. 
>>>>>07-Aug 12:20 company_bacula-sd JobId 217: 3302 Autochanger
>>>>>"loaded? drive 0", result: nothing loaded. 
>>>>>07-Aug 12:20 company_bacula-sd JobId 217: 3304 Issuing
>>>>>autochanger "load slot 8, drive 0" command. 
>>>>>*messages 
>>>>>07-Aug 12:22 company_bacula-sd JobId 217: Fatal error: 3992 Bad
>>>>>autochanger "load slot 8, drive 0": ERR=Child died from signal
>>>>>15: Termination. 
>>>>>Results=Loading media from Storage Element 8 into drive
>>>>>0...done 
>>>>>Program killed by Bacula watchdog (timeout) 
>>>>>
>>>>>07-Aug 12:20 company-appsrv-fd JobId 217: Fatal
>>>>>error: ../../filed/job.c:1817 Bad response to Append Data
>>>>>command. Wanted 3000 OK data, got 3903 Error append data 
>>>>>
>>>>>
>>>>>I doublechecked the time needed by the autochanger for an
>>>>>unload/load operation: I did a couple of operations and they all
>>>>>terminated in less than four minutes. 
>>>>>I think there is a general problem with the autochanger because
>>>>>btape test throws the following error when testing the
>>>>>autochanger: 
>>>>>
>>>>>3301 Issuing autochanger "loaded" command. 
>>>>>3991 Bad autochanger
>>>>>command: /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0
>>>>>0 
>>>>>3991 result="": ERR=Child died from signal 15: Termination 
>>>>>You must correct this error or the Autochanger will not work. 
>>>>>
>>>>>When I run this command "/usr/lib/bacula/mtx-changer /dev/sg3
>>>>>loaded 1 /dev/nst0 0" manually, both as root and as user bacula,
>>>>>it returns "1": 
>>>>>
>>>>>[root@company-Backupserver ~]# su - bacula 
>>>>>-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded
>>>>>1 /dev/nst0 0 
>>>>>1 
>>>>>-bash-3.2$ 
>>>>>
>>>>>Could it be a permission problem? What Uid Gid should
>>>>>mtx-changer have? Could some automatic mechanism on my CentOS
>>>>>installation have changed the permissions of the autochanger
>>>>>device to a lower level? 
>>>>>
>>>>>I am a little stuck here. 
>>>>>
>>>>>Thanks for all help! 
>>>>>
>>>>>Nils 
>>>>>
>>>>>
>>>>>John Drescher schrieb: 
>>>>>
>>>>>
>>>>>>On Wed, Aug 6, 2008 at 1:42 PM, Nils Blanck-Wehde 
>>>>>><nils.blanck-wehde AT backofficeservice DOT biz>
>>>>>><mailto:nils.blanck-wehde AT backofficeservice DOT biz> wrote: 
>>>>>>  
>>>>>>
>>>>>>
>>>>>>>Hello list, 
>>>>>>>
>>>>>>>I just encountered this issue: 
>>>>>>>
>>>>>>>06-Aug 19:31 company_bacula-sd JobId 210: Fatal error: 3992
>>>>>>>Bad 
>>>>>>>autochanger "load slot 8, drive 0": ERR=Child died from
>>>>>>>signal 15: 
>>>>>>>Termination. 
>>>>>>>Results=Loading media from Storage Element 8 into drive
>>>>>>>0...done 
>>>>>>>Program killed by Bacula watchdog (timeout) 
>>>>>>>
>>>>>>>
>>>>>>>Earlier today I got this message: 
>>>>>>>
>>>>>>>06-Aug 17:10 company_bacula-dir JobId 208: Using Device
>>>>>>>"DLT-Drive-1" 
>>>>>>>06-Aug 17:10 company_bacula-sd JobId 208: 3301 Issuing
>>>>>>>autochanger "loaded? drive 0" command. 
>>>>>>>06-Aug 17:10 company_bacula-sd JobId 208: 3302 Autochanger
>>>>>>>"loaded? drive 0", result: nothing loaded. 
>>>>>>>06-Aug 17:10 company_bacula-sd JobId 208: 3304 Issuing
>>>>>>>autochanger "load slot 1, drive 0" command. 
>>>>>>>06-Aug 17:13 company-appsrv-fd JobId 208: Fatal
>>>>>>>error: ../../filed/job.c:1817 Bad response to Append Data
>>>>>>>command. Wanted 3000 OK data 
>>>>>>>, got 3903 Error append data 
>>>>>>>
>>>>>>>06-Aug 17:15 company_bacula-sd JobId 208: Fatal error: 3992
>>>>>>>Bad autochanger "load slot 1, drive 0": ERR=Child died from
>>>>>>>signal 15: Termination. 
>>>>>>>Results=Loading media from Storage Element 1 into drive
>>>>>>>0...done 
>>>>>>>Program killed by Bacula watchdog (timeout) 
>>>>>>>
>>>>>>>   
>>>>>>
>>>>>>The default maximum changer wait is 5 minutes. If the changer
>>>>>>does not 
>>>>>>complete in 5 minutes bacula will kill the mtx-changer
>>>>>>script. 
>>>>>>
>>>>>>See Maximum Changer Wait in 
>>>>>>
>>>>>>http://bacula.org/en/rel-manual/Storage_Daemon_Configuratio.html 
>>>>>>
>>>>>>John 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  
>>>>>>
>>>>>
>>>>>-- 
>>>>>
>>>>>  
>>>>>
>>>>>*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT
>>>>>- 
>>>>>
>>>>>  
>>>>>
>>>>>*Anschrift: * 
>>>>>
>>>>>Niederkastenholzer Str. 40 
>>>>>
>>>>>53881 Euskirchen 
>>>>>
>>>>>     
>>>>>
>>>>>*Telefon:   *+49 2255 953204* * 
>>>>>
>>>>>*Fax:          *+49 2255 953208 
>>>>>
>>>>>*Mobil:      *+49 177 3397547 
>>>>>
>>>>>     
>>>>>
>>>>>*Bankverbindung: * 
>>>>>
>>>>>Raiffeisenbank Rheinbach Voreifel eG 
>>>>>
>>>>>Kto. Nr. 340286014 (BLZ 370 696 27) 
>>>>>
>>>>>     
>>>>>
>>>>>*Online: * 
>>>>>
>>>>>info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de> 
>>>>>
>>>>>www.backofficeservice.biz <http://www.backofficeservice.biz> 
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>------------------------------------------------------------------------- 
>>>>>This SF.Net email is sponsored by the Moblin Your Move
>>>>>Developer's challenge 
>>>>>Build the coolest Linux based applications with Moblin SDK & win
>>>>>great prizes 
>>>>>Grand prize is a trip for two to an Open Source event anywhere
>>>>>in the world 
>>>>>http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>><http://moblin-contest.org/redirect.php?banner_id=100&url=/> 
>>>>>
>>>>>!DSPAM:489ad34a16761048915462! 
>>>>>  
>>>>>
>>>>>
>>>>>_______________________________________________ 
>>>>>Bacula-users mailing list 
>>>>>Bacula-users AT lists.sourceforge DOT net
>>>>><mailto:Bacula-users AT lists.sourceforge DOT net> 
>>>>>https://lists.sourceforge.net/lists/listinfo/bacula-users 
>>>>>
>>>>>
>>>>>!DSPAM:489ad34a16761048915462! 
>>>>>  
>>>>>
>>>>
>>>>-- 
>>>>
>>>>  
>>>>
>>>>*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT
>>>>- 
>>>>
>>>>  
>>>>
>>>>*Anschrift: * 
>>>>
>>>>Niederkastenholzer Str. 40 
>>>>
>>>>53881 Euskirchen 
>>>>
>>>>     
>>>>
>>>>*Telefon:   *+49 2255 953204* * 
>>>>
>>>>*Fax:          *+49 2255 953208 
>>>>
>>>>*Mobil:      *+49 177 3397547 
>>>>
>>>>     
>>>>
>>>>*Bankverbindung: * 
>>>>
>>>>Raiffeisenbank Rheinbach Voreifel eG 
>>>>
>>>>Kto. Nr. 340286014 (BLZ 370 696 27) 
>>>>
>>>>     
>>>>
>>>>*Online: * 
>>>>
>>>>info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de> 
>>>>
>>>>www.backofficeservice.biz <http://www.backofficeservice.biz> 
>>>>
>>>>  
>>>>
>>>>
>>>>------------------------------------------------------------------------ 
>>>>
>>>>------------------------------------------------------------------------- 
>>>>This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>>challenge 
>>>>Build the coolest Linux based applications with Moblin SDK & win
>>>>great prizes 
>>>>Grand prize is a trip for two to an Open Source event anywhere in
>>>>the world 
>>>>http://moblin-contest.org/redirect.php?banner_id=100&url=/ 
>>>>
>>>>
>>>>------------------------------------------------------------------------ 
>>>>
>>>>_______________________________________________ 
>>>>Bacula-users mailing list 
>>>>Bacula-users AT lists.sourceforge DOT net 
>>>>https://lists.sourceforge.net/lists/listinfo/bacula-users 
>>>
>>>
>>>!DSPAM:489add52223816216930368! 
>>>
>>
>>-- 
>> 
>>
>>BackOfficeService- Beratung und Service für Ihre IT -
>>
>> 
>>
>>Anschrift:
>>
>>Niederkastenholzer Str. 40
>>
>>53881 Euskirchen
>>
>>
>>Telefon:   +49
>>2255 953204
>>
>>Fax:          +49
>>2255 953208
>>
>>Mobil:      +49
>>177 3397547
>>
>>
>>Bankverbindung:
>>
>>Raiffeisenbank
>>Rheinbach
>>Voreifel eG
>>
>>Kto. Nr.
>>340286014 (BLZ
>>370 696 27)
>>
>>
>>Online: 
>>
>>info AT backofficeservice DOT biz 
>>
>>www.backofficeservice.biz
>>
>>
>>
>> 
>>
>>
>>-------------------------------------------------------------------------
>>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>>Build the coolest Linux based applications with Moblin SDK & win great prizes
>>Grand prize is a trip for two to an Open Source event anywhere in the world
>>http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>_______________________________________________ Bacula-users mailing list 
>>Bacula-users AT lists.sourceforge DOT net 
>>https://lists.sourceforge.net/lists/listinfo/bacula-users


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users