Bacula-users

Re: [Bacula-users] bacula watchdog killing tape-loading

2008-08-07 08:30:53
Subject: Re: [Bacula-users] bacula watchdog killing tape-loading
From: "Jason A. Kates" <jason AT kates DOT org>
To: Nils Blanck-Wehde <nils.blanck-wehde AT backofficeservice DOT biz>
Date: Thu, 07 Aug 2008 08:20:26 -0400
What groups is the bacula user in and what are the perms on /dev/sg3?
                        Thanks -Jason

On Thu, 2008-08-07 at 14:02 +0200, Nils Blanck-Wehde wrote:
> Hi Terry,
> 
> sorry, I forgot to mention: OS is CentOS 5.2, bacula-version is 2.4.2.
> Problems started all out of a sudden with 2.4.1 after running fine for
> weeks.
> 
> Calling mtx and mtx-changer both as root as well as user bacula works
> flawlessly (slot 1 is currently loaded):
> 
> [root@company-Backupserver ~]# mtx -f /dev/sg3 load 1 0
> Drive 0 Full (Storage Element 1 loaded)
> [root@company-Backupserver ~]# su - bacula
> -bash-3.2$ /usr/sbin/mtx -f /dev/sg3 load 1 0
> Drive 0 Full (Storage Element 1 loaded)
> 
> [root@company-Backupserver ~]# /usr/lib/bacula/mtx-changer /dev/sg3
> loaded 1 /dev/nst0 0
> 1
> [root@company-Backupserver ~]# su - bacula
> -bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
> 1
> -bash-3.2$
> 
> I can see from the webadmin of the autochanger (Quantum Superloader 3
> DLT) that autochanger commands are being executed correctly, still
> bacula reports "ERR=Child died from signal 15: Termination".
> 
> Nils
> 
> 
> T. Horsnell schrieb: 
> > Assuming your O/S is a Unix/Linux of some sort, have you tried the
> > basic mtx command on it? 
> > Something like 
> > 
> > mtx -f /dev/sg3 load 1 0 
> > 
> > (See 'man mtx') 
> > 
> > Cheers, 
> > Terry 
> > 
> > 
> > >   Hi all, 
> > > bacula won't work with our autochanger anymore. I can't find the
> > > source of the problems. 
> > > Here is the output of the autochanger-test of: "btape
> > > -c /etc/bacula/bacula-sd.conf /dev/nst0": 
> > > 
> > > === Autochanger test === 
> > > 
> > > 3301 Issuing autochanger "loaded" command. 
> > > Slot 1 loaded. I am going to unload it. 
> > > 3302 Issuing autochanger "unload 1 0" command. 
> > > unload status=Bad 134217743 
> > > 3992 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
> > > unload 1 /dev/nst0 0 
> > > 3992 result="Unloading drive 0 into Storage Element 1...done 
> > > Program killed by Bacula watchdog (timeout) 
> > > ": ERR=Child died from signal 15: Termination 
> > > 3303 Issuing autochanger "load 1 0" command. 
> > > 3993 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
> > > load 1 /dev/nst0 0 
> > > 3993 result="Loading media from Storage Element 1 into drive
> > > 0...done 
> > > Program killed by Bacula watchdog (timeout) 
> > > ": ERR=Child died from signal 15: Termination 
> > > You must correct this error or the Autochanger will not work. 
> > > 
> > > This is the storage-definition: 
> > > 
> > > Autochanger { 
> > >   Name = QS3DLT 
> > >   Device = DLT-Drive-1 
> > >   Changer Command = "/usr/lib/bacula/mtx-changer %c %o %S %a %d" 
> > >   Changer Device = /dev/sg3 
> > > } 
> > > 
> > > Device { 
> > >   Name = DLT-Drive-1                      # 
> > >   Drive Index = 0 
> > >   Media Type = DLT-VS1 
> > >   Archive Device = /dev/nst0 
> > >   AutomaticMount = yes;               # when device opened, read
> > > it 
> > >   AlwaysOpen = no; 
> > >   RemovableMedia = yes; 
> > >   RandomAccess = no; 
> > >   AutoChanger = yes 
> > >   Maximum Changer Wait = 10 
> > >   Maximum Rewind Wait = 10 
> > >   Maximum Open Wait = 10 
> > > } 
> > > 
> > > When I look at the webadmin of the autochanger I see the
> > > autochanger and the drive perform exactly the requested operations
> > > at the usual speed (~1:30min for an unload operation, ~3:50 for an
> > > unload/load operation). 
> > > 
> > > Still I get lots of killing / timeout problems. 
> > > 
> > > I start to wonder if the autochanger is somewhat defective... 
> > > 
> > > If any of you guys can help I would greatly appreciate it. 
> > > 
> > > All the best, Nils 
> > > 
> > > 
> > > 
> > > Nils Blanck-Wehde schrieb: 
> > > 
> > > > Hi John, 
> > > > 
> > > > thanks for your help. I am quite new to bacula and it seems to
> > > > take some time to fully understand it :-) 
> > > > I am not sure whether it really is a timeout problem. 
> > > > I increased all timeout values to 10 minutes and the killing
> > > > still occurs (after 2:20min): 
> > > > 
> > > > 07-Aug 12:20 company_bacula-dir JobId 217: Start Backup JobId
> > > > 217,
> > > > Job=Fileserver_Lexware_Exchange_to_Tape.2008-08-07_12.20.03 
> > > > 07-Aug 12:20 company_bacula-dir JobId 217: Using Device
> > > > "DLT-Drive-1" 
> > > > 07-Aug 12:20 company_bacula-sd JobId 217: 3301 Issuing
> > > > autochanger "loaded? drive 0" command. 
> > > > 07-Aug 12:20 company_bacula-sd JobId 217: 3302 Autochanger
> > > > "loaded? drive 0", result: nothing loaded. 
> > > > 07-Aug 12:20 company_bacula-sd JobId 217: 3304 Issuing
> > > > autochanger "load slot 8, drive 0" command. 
> > > > *messages 
> > > > 07-Aug 12:22 company_bacula-sd JobId 217: Fatal error: 3992 Bad
> > > > autochanger "load slot 8, drive 0": ERR=Child died from signal
> > > > 15: Termination. 
> > > > Results=Loading media from Storage Element 8 into drive
> > > > 0...done 
> > > > Program killed by Bacula watchdog (timeout) 
> > > > 
> > > > 07-Aug 12:20 company-appsrv-fd JobId 217: Fatal
> > > > error: ../../filed/job.c:1817 Bad response to Append Data
> > > > command. Wanted 3000 OK data, got 3903 Error append data 
> > > > 
> > > > 
> > > > I doublechecked the time needed by the autochanger for an
> > > > unload/load operation: I did a couple of operations and they all
> > > > terminated in less than four minutes. 
> > > > I think there is a general problem with the autochanger because
> > > > btape test throws the following error when testing the
> > > > autochanger: 
> > > > 
> > > > 3301 Issuing autochanger "loaded" command. 
> > > > 3991 Bad autochanger
> > > > command: /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0
> > > > 0 
> > > > 3991 result="": ERR=Child died from signal 15: Termination 
> > > > You must correct this error or the Autochanger will not work. 
> > > > 
> > > > When I run this command "/usr/lib/bacula/mtx-changer /dev/sg3
> > > > loaded 1 /dev/nst0 0" manually, both as root and as user bacula,
> > > > it returns "1": 
> > > > 
> > > > [root@company-Backupserver ~]# su - bacula 
> > > > -bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded
> > > > 1 /dev/nst0 0 
> > > > 1 
> > > > -bash-3.2$ 
> > > > 
> > > > Could it be a permission problem? What Uid Gid should
> > > > mtx-changer have? Could some automatic mechanism on my CentOS
> > > > installation have changed the permissions of the autochanger
> > > > device to a lower level? 
> > > > 
> > > > I am a little stuck here. 
> > > > 
> > > > Thanks for all help! 
> > > > 
> > > > Nils 
> > > > 
> > > > 
> > > > John Drescher schrieb: 
> > > > 
> > > > > On Wed, Aug 6, 2008 at 1:42 PM, Nils Blanck-Wehde 
> > > > > <nils.blanck-wehde AT backofficeservice DOT biz>
> > > > > <mailto:nils.blanck-wehde AT backofficeservice DOT biz> wrote: 
> > > > >   
> > > > > 
> > > > > > Hello list, 
> > > > > > 
> > > > > > I just encountered this issue: 
> > > > > > 
> > > > > > 06-Aug 19:31 company_bacula-sd JobId 210: Fatal error: 3992
> > > > > > Bad 
> > > > > > autochanger "load slot 8, drive 0": ERR=Child died from
> > > > > > signal 15: 
> > > > > > Termination. 
> > > > > > Results=Loading media from Storage Element 8 into drive
> > > > > > 0...done 
> > > > > > Program killed by Bacula watchdog (timeout) 
> > > > > > 
> > > > > > 
> > > > > > Earlier today I got this message: 
> > > > > > 
> > > > > > 06-Aug 17:10 company_bacula-dir JobId 208: Using Device
> > > > > > "DLT-Drive-1" 
> > > > > > 06-Aug 17:10 company_bacula-sd JobId 208: 3301 Issuing
> > > > > > autochanger "loaded? drive 0" command. 
> > > > > > 06-Aug 17:10 company_bacula-sd JobId 208: 3302 Autochanger
> > > > > > "loaded? drive 0", result: nothing loaded. 
> > > > > > 06-Aug 17:10 company_bacula-sd JobId 208: 3304 Issuing
> > > > > > autochanger "load slot 1, drive 0" command. 
> > > > > > 06-Aug 17:13 company-appsrv-fd JobId 208: Fatal
> > > > > > error: ../../filed/job.c:1817 Bad response to Append Data
> > > > > > command. Wanted 3000 OK data 
> > > > > > , got 3903 Error append data 
> > > > > > 
> > > > > > 06-Aug 17:15 company_bacula-sd JobId 208: Fatal error: 3992
> > > > > > Bad autochanger "load slot 1, drive 0": ERR=Child died from
> > > > > > signal 15: Termination. 
> > > > > > Results=Loading media from Storage Element 1 into drive
> > > > > > 0...done 
> > > > > > Program killed by Bacula watchdog (timeout) 
> > > > > > 
> > > > > >    
> > > > > The default maximum changer wait is 5 minutes. If the changer
> > > > > does not 
> > > > > complete in 5 minutes bacula will kill the mtx-changer
> > > > > script. 
> > > > > 
> > > > > See Maximum Changer Wait in 
> > > > > 
> > > > > http://bacula.org/en/rel-manual/Storage_Daemon_Configuratio.html 
> > > > > 
> > > > > John 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > >   
> > > > > 
> > > > 
> > > > -- 
> > > > 
> > > >   
> > > > 
> > > > *B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT
> > > > - 
> > > > 
> > > >   
> > > > 
> > > > *Anschrift: * 
> > > > 
> > > > Niederkastenholzer Str. 40 
> > > > 
> > > > 53881 Euskirchen 
> > > > 
> > > >      
> > > > 
> > > > *Telefon:   *+49 2255 953204* * 
> > > > 
> > > > *Fax:          *+49 2255 953208 
> > > > 
> > > > *Mobil:      *+49 177 3397547 
> > > > 
> > > >      
> > > > 
> > > > *Bankverbindung: * 
> > > > 
> > > > Raiffeisenbank Rheinbach Voreifel eG 
> > > > 
> > > > Kto. Nr. 340286014 (BLZ 370 696 27) 
> > > > 
> > > >      
> > > > 
> > > > *Online: * 
> > > > 
> > > > info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de> 
> > > > 
> > > > www.backofficeservice.biz <http://www.backofficeservice.biz> 
> > > > 
> > > >   
> > > > 
> > > > 
> > > > 
> > > > 
> > > > -------------------------------------------------------------------------
> > > >  
> > > > This SF.Net email is sponsored by the Moblin Your Move
> > > > Developer's challenge 
> > > > Build the coolest Linux based applications with Moblin SDK & win
> > > > great prizes 
> > > > Grand prize is a trip for two to an Open Source event anywhere
> > > > in the world 
> > > > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > > > <http://moblin-contest.org/redirect.php?banner_id=100&url=/> 
> > > > 
> > > > !DSPAM:489ad34a16761048915462! 
> > > >   
> > > > 
> > > > 
> > > > _______________________________________________ 
> > > > Bacula-users mailing list 
> > > > Bacula-users AT lists.sourceforge DOT net
> > > > <mailto:Bacula-users AT lists.sourceforge DOT net> 
> > > > https://lists.sourceforge.net/lists/listinfo/bacula-users 
> > > > 
> > > > 
> > > > !DSPAM:489ad34a16761048915462! 
> > > >   
> > > > 
> > > 
> > > -- 
> > > 
> > >   
> > > 
> > > *B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT
> > > - 
> > > 
> > >   
> > > 
> > > *Anschrift: * 
> > > 
> > > Niederkastenholzer Str. 40 
> > > 
> > > 53881 Euskirchen 
> > > 
> > >      
> > > 
> > > *Telefon:   *+49 2255 953204* * 
> > > 
> > > *Fax:          *+49 2255 953208 
> > > 
> > > *Mobil:      *+49 177 3397547 
> > > 
> > >      
> > > 
> > > *Bankverbindung: * 
> > > 
> > > Raiffeisenbank Rheinbach Voreifel eG 
> > > 
> > > Kto. Nr. 340286014 (BLZ 370 696 27) 
> > > 
> > >      
> > > 
> > > *Online: * 
> > > 
> > > info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de> 
> > > 
> > > www.backofficeservice.biz <http://www.backofficeservice.biz> 
> > > 
> > >   
> > > 
> > > 
> > > ------------------------------------------------------------------------ 
> > > 
> > > ------------------------------------------------------------------------- 
> > > This SF.Net email is sponsored by the Moblin Your Move Developer's
> > > challenge 
> > > Build the coolest Linux based applications with Moblin SDK & win
> > > great prizes 
> > > Grand prize is a trip for two to an Open Source event anywhere in
> > > the world 
> > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ 
> > > 
> > > 
> > > ------------------------------------------------------------------------ 
> > > 
> > > _______________________________________________ 
> > > Bacula-users mailing list 
> > > Bacula-users AT lists.sourceforge DOT net 
> > > https://lists.sourceforge.net/lists/listinfo/bacula-users 
> > 
> > 
> > !DSPAM:489add52223816216930368! 
> > 
> 
> -- 
>  
> 
> BackOfficeService- Beratung und Service für Ihre IT -
> 
>  
> 
> Anschrift:
> 
> Niederkastenholzer Str. 40
> 
> 53881 Euskirchen
> 
> 
> Telefon:   +49
> 2255 953204
> 
> Fax:          +49
> 2255 953208
> 
> Mobil:      +49
> 177 3397547
> 
> 
> Bankverbindung:
> 
> Raiffeisenbank
> Rheinbach
> Voreifel eG
> 
> Kto. Nr.
> 340286014 (BLZ
> 370 696 27)
> 
> 
> Online: 
> 
> info AT backofficeservice DOT biz 
> 
> www.backofficeservice.biz
> 
> 
> 
>  
> 
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________ Bacula-users mailing list 
> Bacula-users AT lists.sourceforge DOT net 
> https://lists.sourceforge.net/lists/listinfo/bacula-users
-- 
----------------------------------------------------------------------------
Jason A. Kates (jason AT kates DOT org) 
Fax:    208-975-1514
Phone:  212-400-1670 x2
============================================================================



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users