Re: [Bacula-users] bacula watchdog killing tape-loading
2008-08-07 08:03:09
Hi Terry,
sorry, I forgot to mention: OS is CentOS 5.2, bacula-version is 2.4.2.
Problems started all out of a sudden with 2.4.1 after running fine for
weeks.
Calling mtx and mtx-changer both as root as well as user bacula works
flawlessly (slot 1 is currently loaded):
[root@company-Backupserver ~]# mtx -f /dev/sg3 load 1 0
Drive 0 Full (Storage Element 1 loaded)
[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/sbin/mtx -f /dev/sg3 load 1 0
Drive 0 Full (Storage Element 1 loaded)
[root@company-Backupserver ~]# /usr/lib/bacula/mtx-changer /dev/sg3
loaded 1 /dev/nst0 0
1
[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
1
-bash-3.2$
I can see from the webadmin of the autochanger (Quantum Superloader 3
DLT) that autochanger commands are being executed correctly, still
bacula reports "ERR=Child died from signal 15: Termination".
Nils
T. Horsnell schrieb:
Assuming
your O/S is a Unix/Linux of some sort, have you tried the basic mtx
command on it?
Something like
mtx -f /dev/sg3 load 1 0
(See 'man mtx')
Cheers,
Terry
Hi all,
bacula won't work with our autochanger anymore. I can't find the source
of the problems.
Here is the output of the autochanger-test of: "btape -c
/etc/bacula/bacula-sd.conf /dev/nst0":
=== Autochanger test ===
3301 Issuing autochanger "loaded" command.
Slot 1 loaded. I am going to unload it.
3302 Issuing autochanger "unload 1 0" command.
unload status=Bad 134217743
3992 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
unload 1 /dev/nst0 0
3992 result="Unloading drive 0 into Storage Element 1...done
Program killed by Bacula watchdog (timeout)
": ERR=Child died from signal 15: Termination
3303 Issuing autochanger "load 1 0" command.
3993 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3 load
1 /dev/nst0 0
3993 result="Loading media from Storage Element 1 into drive 0...done
Program killed by Bacula watchdog (timeout)
": ERR=Child died from signal 15: Termination
You must correct this error or the Autochanger will not work.
This is the storage-definition:
Autochanger {
Name = QS3DLT
Device = DLT-Drive-1
Changer Command = "/usr/lib/bacula/mtx-changer %c %o %S %a %d"
Changer Device = /dev/sg3
}
Device {
Name = DLT-Drive-1 #
Drive Index = 0
Media Type = DLT-VS1
Archive Device = /dev/nst0
AutomaticMount = yes; # when device opened, read it
AlwaysOpen = no;
RemovableMedia = yes;
RandomAccess = no;
AutoChanger = yes
Maximum Changer Wait = 10
Maximum Rewind Wait = 10
Maximum Open Wait = 10
}
When I look at the webadmin of the autochanger I see the autochanger
and the drive perform exactly the requested operations at the usual
speed (~1:30min for an unload operation, ~3:50 for an unload/load
operation).
Still I get lots of killing / timeout problems.
I start to wonder if the autochanger is somewhat defective...
If any of you guys can help I would greatly appreciate it.
All the best, Nils
Nils Blanck-Wehde schrieb:
Hi John,
thanks for your help. I am quite new to bacula and it seems to take
some time to fully understand it :-)
I am not sure whether it really is a timeout problem.
I increased all timeout values to 10 minutes and the killing still
occurs (after 2:20min):
07-Aug 12:20 company_bacula-dir JobId 217: Start Backup JobId 217,
Job=Fileserver_Lexware_Exchange_to_Tape.2008-08-07_12.20.03
07-Aug 12:20 company_bacula-dir JobId 217: Using Device "DLT-Drive-1"
07-Aug 12:20 company_bacula-sd JobId 217: 3301 Issuing autochanger
"loaded? drive 0" command.
07-Aug 12:20 company_bacula-sd JobId 217: 3302 Autochanger "loaded?
drive 0", result: nothing loaded.
07-Aug 12:20 company_bacula-sd JobId 217: 3304 Issuing autochanger
"load slot 8, drive 0" command.
*messages
07-Aug 12:22 company_bacula-sd JobId 217: Fatal error: 3992 Bad
autochanger "load slot 8, drive 0": ERR=Child died from signal 15:
Termination.
Results=Loading media from Storage Element 8 into drive 0...done
Program killed by Bacula watchdog (timeout)
07-Aug 12:20 company-appsrv-fd JobId 217: Fatal error:
../../filed/job.c:1817 Bad response to Append Data command. Wanted 3000
OK data, got 3903 Error append data
I doublechecked the time needed by the autochanger for an unload/load
operation: I did a couple of operations and they all terminated in less
than four minutes.
I think there is a general problem with the autochanger because btape
test throws the following error when testing the autochanger:
3301 Issuing autochanger "loaded" command.
3991 Bad autochanger command: /usr/lib/bacula/mtx-changer /dev/sg3
loaded 1 /dev/nst0 0
3991 result="": ERR=Child died from signal 15: Termination
You must correct this error or the Autochanger will not work.
When I run this command "/usr/lib/bacula/mtx-changer /dev/sg3 loaded 1
/dev/nst0 0" manually, both as root and as user bacula, it returns "1":
[root@company-Backupserver ~]# su - bacula
-bash-3.2$ /usr/lib/bacula/mtx-changer /dev/sg3 loaded 1 /dev/nst0 0
1
-bash-3.2$
Could it be a permission problem? What Uid Gid should mtx-changer have?
Could some automatic mechanism on my CentOS installation have changed
the permissions of the autochanger device to a lower level?
I am a little stuck here.
Thanks for all help!
Nils
John Drescher schrieb:
On Wed, Aug 6, 2008 at 1:42 PM, Nils
Blanck-Wehde
<nils.blanck-wehde AT backofficeservice DOT biz>
<mailto:nils.blanck-wehde AT backofficeservice DOT biz> wrote:
Hello list,
I just encountered this issue:
06-Aug 19:31 company_bacula-sd JobId 210: Fatal error: 3992 Bad
autochanger "load slot 8, drive 0": ERR=Child died from signal 15:
Termination.
Results=Loading media from Storage Element 8 into drive 0...done
Program killed by Bacula watchdog (timeout)
Earlier today I got this message:
06-Aug 17:10 company_bacula-dir JobId 208: Using Device "DLT-Drive-1"
06-Aug 17:10 company_bacula-sd JobId 208: 3301 Issuing autochanger
"loaded? drive 0" command.
06-Aug 17:10 company_bacula-sd JobId 208: 3302 Autochanger "loaded?
drive 0", result: nothing loaded.
06-Aug 17:10 company_bacula-sd JobId 208: 3304 Issuing autochanger
"load slot 1, drive 0" command.
06-Aug 17:13 company-appsrv-fd JobId 208: Fatal error:
../../filed/job.c:1817 Bad response to Append Data command. Wanted 3000
OK data
, got 3903 Error append data
06-Aug 17:15 company_bacula-sd JobId 208: Fatal error: 3992 Bad
autochanger "load slot 1, drive 0": ERR=Child died from signal 15:
Termination.
Results=Loading media from Storage Element 1 into drive 0...done
Program killed by Bacula watchdog (timeout)
The default maximum changer wait is 5 minutes. If the changer does not
complete in 5 minutes bacula will kill the mtx-changer script.
See Maximum Changer Wait in
http://bacula.org/en/rel-manual/Storage_Daemon_Configuratio.html
John
--
*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT -
*Anschrift: *
Niederkastenholzer Str. 40
53881 Euskirchen
*Telefon: *+49 2255 953204* *
*Fax: *+49 2255 953208
*Mobil: *+49 177 3397547
*Bankverbindung: *
Raiffeisenbank Rheinbach Voreifel eG
Kto. Nr. 340286014 (BLZ 370 696 27)
*Online: *
info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de>
www.backofficeservice.biz <http://www.backofficeservice.biz>
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win
great prizes
Grand prize is a trip for two to an Open Source event anywhere in the
world
http://moblin-contest.org/redirect.php?banner_id=100&url="">
<http://moblin-contest.org/redirect.php?banner_id=100&url="">
!DSPAM:489ad34a16761048915462!
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
<mailto:Bacula-users AT lists.sourceforge DOT net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
!DSPAM:489ad34a16761048915462!
--
*B**ack**O**ffice**S**ervice* - Beratung und Service für Ihre IT -
*Anschrift: *
Niederkastenholzer Str. 40
53881 Euskirchen
*Telefon: *+49 2255 953204* *
*Fax: *+49 2255 953208
*Mobil: *+49 177 3397547
*Bankverbindung: *
Raiffeisenbank Rheinbach Voreifel eG
Kto. Nr. 340286014 (BLZ 370 696 27)
*Online: *
info AT backofficeservice DOT biz <mailto:bos AT blanck-wehde DOT de>
www.backofficeservice.biz <http://www.backofficeservice.biz>
------------------------------------------------------------------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win
great prizes
Grand prize is a trip for two to an Open Source event anywhere in the
world
http://moblin-contest.org/redirect.php?banner_id=100&url="">
------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
!DSPAM:489add52223816216930368!
--
BackOfficeService
-
Beratung und Service für Ihre IT -
Anschrift:
Niederkastenholzer
Str. 40
53881
Euskirchen
|
Telefon:
+49
2255 953204
Fax:
+49
2255 953208
Mobil: +49
177 3397547
|
Bankverbindung:
Raiffeisenbank
Rheinbach Voreifel eG
Kto.
Nr. 340286014 (BLZ 370 696 27)
|
Online:
info@backofficeservice.biz
www.backofficeservice.biz
|
|
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
|
|