Bacula-users

Re: [Bacula-users] bacula watchdog killing tape-loading

2008-08-07 08:52:51
Subject: Re: [Bacula-users] bacula watchdog killing tape-loading
From: Nils Blanck-Wehde <nils.blanck-wehde AT backofficeservice DOT biz>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 07 Aug 2008 14:52:26 +0200
user "bacula" is member of "disk" and "bacula":
cat /etc/group|grep bacula
disk:x:6:root,bacula
bacula:x:101:bacula,apache


the autochanger is:
ll /dev/sg3
crw-rw---- 1 root disk 21, 3  7. Aug 12:16 /dev/sg3


Thanks, Nils

Jason A. Kates schrieb:
What groups is the bacula user in and what are the perms on /dev/sg3?
			Thanks -Jason

On Thu, 2008-08-07 at 14:02 +0200, Nils Blanck-Wehde wrote:
  
Hi Terry,

sorry, I forgot to mention: OS is CentOS 5.2, bacula-version is 2.4.2.
Problems started all out of a sudden with 2.4.1 after running fine for
weeks.

Calling mtx and mtx-changer both as root as well as user bacula works
flawlessly (slot 1 is currently loaded):

[root@company-Backupserver ~]# mtx -f /dev/sg3 load 1 0
Drive 0 Full (Storage Element 1 loaded)
[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/sbin/mtx -f /dev/sg3 load 1 0
Drive 0 Full (Storage Element 1 loaded)

[root@company-Backupserver ~]# /usr/lib/bacula/mtx-changer /dev/sg3
loaded 1 /dev/nst0 0
1
[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
1
-bash-3.2$

I can see from the webadmin of the autochanger (Quantum Superloader 3
DLT) that autochanger commands are being executed correctly, still
bacula reports "ERR=Child died from signal 15: Termination".

Nils


T. Horsnell schrieb: 
    
Assuming your O/S is a Unix/Linux of some sort, have you tried the
basic mtx command on it? 
Something like 

mtx -f /dev/sg3 load 1 0 

(See 'man mtx') 

Cheers, 
Terry 


      
  Hi all, 
bacula won't work with our autochanger anymore. I can't find the
source of the problems. 
Here is the output of the autochanger-test of: "btape
-c /etc/bacula/bacula-sd.conf /dev/nst0": 

=== Autochanger test === 

3301 Issuing autochanger "loaded" command. 
Slot 1 loaded. I am going to unload it. 
3302 Issuing autochanger "unload 1 0" command. 
unload status=Bad 134217743 
3992 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
unload 1 /dev/nst0 0 
3992 result="Unloading drive 0 into Storage Element 1...done 
Program killed by Bacula watchdog (timeout) 
": ERR=Child died from signal 15: Termination 
3303 Issuing autochanger "load 1 0" command. 
3993 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
load 1 /dev/nst0 0 
3993 result="Loading media from Storage Element 1 into drive
0...done 
Program killed by Bacula watchdog (timeout) 
": ERR=Child died from signal 15: Termination 
You must correct this error or the Autochanger will not work. 

This is the storage-definition: 

Autochanger { 
  Name = QS3DLT 
  Device = DLT-Drive-1 
  Changer Command = "/usr/lib/bacula/mtx-changer %c %o %S %a %d" 
  Changer Device = /dev/sg3 
} 

Device { 
  Name = DLT-Drive-1                      # 
  Drive Index = 0 
  Media Type = DLT-VS1 
  Archive Device = /dev/nst0 
  AutomaticMount = yes;               # when device opened, read
it 
  AlwaysOpen = no; 
  RemovableMedia = yes; 
  RandomAccess = no; 
  AutoChanger = yes 
  Maximum Changer Wait = 10 
  Maximum Rewind Wait = 10 
  Maximum Open Wait = 10 
} 

When I look at the webadmin of the autochanger I see the
autochanger and the drive perform exactly the requested operations
at the usual speed (~1:30min for an unload operation, ~3:50 for an
unload/load operation). 

Still I get lots of killing / timeout problems. 

I start to wonder if the autochanger is somewhat defective... 

If any of you guys can help I would greatly appreciate it. 

All the best, Nils 



Nils Blanck-Wehde schrieb: 

        
Hi John, 

thanks for your help. I am quite new to bacula and it seems to
take some time to fully understand it :-) 
I am not sure whether it really is a timeout problem. 
I increased all timeout values to 10 minutes and the killing
still occurs (after 2:20min): 

07-Aug 12:20 company_bacula-dir JobId 217: Start Backup JobId
217,
Job=Fileserver_Lexware_Exchange_to_Tape.2008-08-07_12.20.03 
07-Aug 12:20 company_bacula-dir JobId 217: Using Device
"DLT-Drive-1" 
07-Aug 12:20 company_bacula-sd JobId 217: 3301 Issuing
autochanger "loaded? drive 0" command. 
07-Aug 12:20 company_bacula-sd JobId 217: 3302 Autochanger
"loaded? drive 0", result: nothing loaded. 
07-Aug 12:20 company_bacula-sd JobId 217: 3304 Issuing
autochanger "load slot 8, drive 0" command. 
*messages 
07-Aug 12:22 company_bacula-sd JobId 217: Fatal error: 3992 Bad
autochanger "load slot 8, drive 0": ERR=Child died from signal
15: Termination. 
Results=Loading media from Storage Element 8 into drive
0...done 
Program killed by Bacula watchdog (timeout) 

07-Aug 12:20 company-appsrv-fd JobId 217: Fatal
error: ../../filed/job.c:1817 Bad response to Append Data
command. Wanted 3000 OK data, got 3903 Error append data 


I doublechecked the time needed by the autochanger for an
unload/load operation: I did a couple of operations and they all
terminated in less than four minutes. 
I think there is a general problem with the autochanger because
btape test throws the following error when testing the
autochanger: 

3301 Issuing autochanger "loaded" command. 
3991 Bad autochanger
command: /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0
0 
3991 result="": ERR=Child died from signal 15: Termination 
You must correct this error or the Autochanger will not work. 

When I run this command "/usr/lib/bacula/mtx-changer /dev/sg3
loaded 1 /dev/nst0 0" manually, both as root and as user bacula,
it returns "1": 

[root@company-Backupserver ~]# su - bacula 
-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded
1 /dev/nst0 0 
1 
-bash-3.2$ 

Could it be a permission problem? What Uid Gid should
mtx-changer have? Could some automatic mechanism on my CentOS
installation have changed the permissions of the autochanger
device to a lower level? 

I am a little stuck here. 

Thanks for all help! 

Nils 


John Drescher schrieb: 

          
On Wed, Aug 6, 2008 at 1:42 PM, Nils Blanck-Wehde 
<nils.blanck-wehde AT backofficeservice DOT biz>
<mailto:nils.blanck-wehde AT backofficeservice DOT biz> wrote: 
  

            
Hello list, 

I just encountered this issue: 

06-Aug 19:31 company_bacula-sd JobId 210: Fatal error: 3992
Bad 
autochanger "load slot 8, drive 0": ERR=Child died from
signal 15: 
Termination. 
Results=Loading media from Storage Element 8 into drive
0...done 
Program killed by Bacula watchdog (timeout) 


Earlier today I got this message: 

06-Aug 17:10 company_bacula-dir JobId 208: Using Device
"DLT-Drive-1" 
06-Aug 17:10 company_bacula-sd JobId 208: 3301 Issuing
autochanger "loaded? drive 0" command. 
06-Aug 17:10 company_bacula-sd JobId 208: 3302 Autochanger
"loaded? drive 0", result: nothing loaded. 
06-Aug 17:10 company_bacula-sd JobId 208: 3304 Issuing
autochanger "load slot 1, drive 0" command. 
06-Aug 17:13 company-appsrv-fd JobId 208: Fatal
error: ../../filed/job.c:1817 Bad response to Append Data
command. Wanted 3000 OK data 
, got 3903 Error append data 

06-Aug 17:15 company_bacula-sd JobId 208: Fatal error: 3992
Bad autochanger "load slot 1, drive 0": ERR=Child died from
signal 15: Termination. 
Results=Loading media from Storage Element 1 into drive
0...done 
Program killed by Bacula watchdog (timeout) 

   
              
The default maximum changer wait is 5 minutes. If the changer
does not 
complete in 5 minutes bacula will kill the mtx-changer
script. 

See Maximum Changer Wait in 

http://bacula.org/en/rel-manual/Storage_Daemon_Configuratio.html 

John 




  

            
-- 

  

*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT
- 

  

*Anschrift: * 

Niederkastenholzer Str. 40 

53881 Euskirchen 

     

*Telefon:   *+49 2255 953204* * 

*Fax:          *+49 2255 953208 

*Mobil:      *+49 177 3397547 

     

*Bankverbindung: * 

Raiffeisenbank Rheinbach Voreifel eG 

Kto. Nr. 340286014 (BLZ 370 696 27) 

     

*Online: * 

info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de> 

www.backofficeservice.biz <http://www.backofficeservice.biz> 

  




------------------------------------------------------------------------- 
This SF.Net email is sponsored by the Moblin Your Move
Developer's challenge 
Build the coolest Linux based applications with Moblin SDK & win
great prizes 
Grand prize is a trip for two to an Open Source event anywhere
in the world 
http://moblin-contest.org/redirect.php?banner_id=100&url="">
<http://moblin-contest.org/redirect.php?banner_id=100&url=""> 

 
  


_______________________________________________ 
Bacula-users mailing list 
Bacula-users AT lists.sourceforge DOT net
<mailto:Bacula-users AT lists.sourceforge DOT net> 
https://lists.sourceforge.net/lists/listinfo/bacula-users 


!DSPAM:489ad34a16761048915462! 
  

          
-- 

  

*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT
- 

  

*Anschrift: * 

Niederkastenholzer Str. 40 

53881 Euskirchen 

     

*Telefon:   *+49 2255 953204* * 

*Fax:          *+49 2255 953208 

*Mobil:      *+49 177 3397547 

     

*Bankverbindung: * 

Raiffeisenbank Rheinbach Voreifel eG 

Kto. Nr. 340286014 (BLZ 370 696 27) 

     

*Online: * 

info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de> 

www.backofficeservice.biz <http://www.backofficeservice.biz> 

  


------------------------------------------------------------------------ 

------------------------------------------------------------------------- 
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge 
Build the coolest Linux based applications with Moblin SDK & win
great prizes 
Grand prize is a trip for two to an Open Source event anywhere in
the world 
http://moblin-contest.org/redirect.php?banner_id=100&url=""> 


------------------------------------------------------------------------ 

_______________________________________________ 
Bacula-users mailing list 
Bacula-users AT lists.sourceforge DOT net 
https://lists.sourceforge.net/lists/listinfo/bacula-users 
        
!DSPAM:489add52223816216930368! 

      
-- 
 

BackOfficeService- Beratung und Service für Ihre IT -

 

Anschrift:

Niederkastenholzer Str. 40

53881 Euskirchen


Telefon:   +49
2255 953204

Fax:          +49
2255 953208

Mobil:      +49
177 3397547


Bankverbindung:

Raiffeisenbank
Rheinbach
Voreifel eG

Kto. Nr.
340286014 (BLZ
370 696 27)


Online: 

info AT backofficeservice DOT biz 

www.backofficeservice.biz



 


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url="">
_______________________________________________ Bacula-users mailing list Bacula-users AT lists.sourceforge DOT net https://lists.sourceforge.net/lists/listinfo/bacula-users
    

--

 

BackOfficeService - Beratung und Service für Ihre IT -

 

Anschrift:

Niederkastenholzer Str. 40

53881 Euskirchen

Telefon:   +49 2255 953204

Fax:          +49 2255 953208

Mobil:      +49 177 3397547

Bankverbindung:

Raiffeisenbank Rheinbach Voreifel eG

Kto. Nr. 340286014 (BLZ 370 696 27)

Online:

info@backofficeservice.biz

www.backofficeservice.biz

 

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users