Bacula-users

Re: [Bacula-users] RunBeforeJob get stuck .. sometimes ..

2008-07-21 11:00:53
Subject: Re: [Bacula-users] RunBeforeJob get stuck .. sometimes ..
From: "Jeremy Koppel" <jkoppel AT bluecanopy DOT com>
To: "Arno Lehmann" <al AT its-lehmann DOT de>, "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Sun, 20 Jul 2008 23:36:20 -0400

Arno,

 

      Thanks for your reply.  I did some more testing today on this; I tried your suggestions, but was unsuccessful.  I then modified my script (from ClientRunAfterJob) to include an echo at the end.  After the backup, that echo does come through.  So it's not the Alfresco init script that's hanging, it's backupComplete.bash.

 

From the job:  ClientRunAfterJob = "/etc/bacula/customScripts/backupComplete.bash"

 

The script: /etc/bacula/customScripts/backupComplete.bash:

 

#!/bin/bash

#

# This script fires after Alfresco backup is complete.

 

source /etc/profile

/etc/init.d/alfresco start

echo "This was echoed from the bacula script."

 

 

 

Bacula log:

 

20-Jul 21:20 ns2-sd: Volume "AAA017L2" previously written, moving to end of data.

20-Jul 21:21 ns2-sd: Ready to append to end of Volume "AAA017L2" at file=9.

20-Jul 21:32 bckm1-fd: ClientAfterJob: run command "/etc/bacula/customScripts/backupComplete.bash"

20-Jul 21:32 bckm1-fd: ClientAfterJob: Starting OpenOffice service ...

20-Jul 21:32 bckm1-fd: ClientAfterJob: Starting Alfresco ...

20-Jul 21:32 bckm1-fd: ClientAfterJob: This was echoed from the bacula script.

20-Jul 21:51 ns2-dir: bckm1.2008-07-20_21.19.44 Fatal error: Network error with FD during Backup: ERR=No data available

 

 

The last line is where I restart the Bacula client service on bckm1-fd since the script does not complete.  Any idea why this would be hanging on my script?

 

·         We have used Bacula for a couple of years, and have similar startup scripts on other servers that run without incident.

·         If I run backupComplete.bash from the command line, it completes without error or hanging.

·         It only hangs when Bacula kicks off this script.

 

I haven’t found one posted, but is there some related bug in the Bacula client we’re using?  (2.0.3?)

 

There are only two versions of Bacula currently in Portage, 2.0.3, and 2.4.1.  These were added just recently, so our server is running 1.36.3 (not sure if it would even upgrade smoothly to the newer versions).

 

 

What else should I try?

 

 

 

 

-----Original Message-----
From: bacula-users-bounces AT lists.sourceforge DOT net [mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of Arno Lehmann
Sent: Thursday, July 17, 2008 16:43
To: bacula-users AT lists.sourceforge DOT net
Subject: Re: [Bacula-users] RunBeforeJob get stuck .. sometimes ..

 

Hi,

 

17.07.2008 20:43, Jeremy Koppel wrote:

> We are experiencing something similar backing up one of our VMs.  We're backing up an Alfresco instance; the ClientRunBeforeJob script that shuts Alfresco down from the DB VM runs fine, but the ClientRunAfterJob script that starts it back up from the repository VM produces a zombie.  All that's in the script right now is a call to the init script:  '/etc/init.d/alfresco start'.

>

> When I look at it in the morning, Bacula is still 'backing up' the repository.

>

> Here's the last few lines from last night's backup:

>    17-Jul 02:11 bckm1-fd: ClientAfterJob: run command "/etc/bacula/customScripts/backupComplete.bash"

>    17-Jul 02:11 bckm1-fd: ClientAfterJob: Starting OpenOffice service ...

>    17-Jul 02:11 bckm1-fd: ClientAfterJob: Starting Alfresco ...

>

> The last 2 are from the Alfresco init script.  The OS shows that the ClientRunAfterJob script is a zombie:

>

> # ps -Alf

> 0 Z root      5663 28588  0  78   0 -     0 exit   02:11 ?        00:00:00

> [backupComplete.] <defunct>

>

>

> Restarting Bacula on the client does take that process with it, but of course marks the backup as an error.  Any ideas where I should be looking?

 

You could try to run the Alfresco init script backgrounded

(/etc/init.d/alfresco start &) or even nohup'ed (nohup

/etc/init.d/alfresco.start > /var/log/alfresco-restart.out 2>&1 &) and

see if that at least lets Bacula continue.

 

I assume the init script somehow hangs, for example if it doesn't have

a terminal. But that's something to debug separately...

 

Arno

 

>

> Our config:

> Director:  Gentoo Linux 2.6.14-r5 (32-bit i686), running Bacula 1.36.3

> Client:  Gentoo Linux OpenVZ kernel 2.6.18-028stab053 (AMD64 build (for Intel Xeon quad-core)), running Bacula client-only 2.0.3 (all versions for AMD64 marked in the testing branch in Portage)

>

>

> --Jeremy

>

>

>

> -----Original Message-----

> From: bacula-users-bounces AT lists.sourceforge DOT net [mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of Michael Patzer

> Sent: Wednesday, May 07, 2008 4:29

> To: bacula-users AT lists.sourceforge DOT net

> Subject: [Bacula-users] RunBeforeJob get stuck .. sometimes ..

>

> hi,

>

> i use the following script to connect to some dmz-servers in

> runbeforejob.

> if i run the job manually it works always, but if the scheduler runs it

> at

> night, all the jobs with this script always get stuck until i kill the

> "[ssh-tunnel.sh] <defunct>" proecesses.

>

> after that the queued jobs for the same clients, with the same

> runbeforejob,

> run fine.

>

> while it waits the tunnel itself opens fine, but it looks like that the

> script

> doesn't exit becouse ssh waits for something...

>

> any ideas how to fix that?

>

> ------------

>

> #!/bin/sh

>

> # variables

> USER=bacula

> CLIENT=$2

> LOCAL=bla.domain.tld

> SSH=/usr/bin/ssh

>

> case "$1" in

>  start)

>     # create ssh-tunnel

>         echo "Starting SSH-tunnel to $CLIENT..."

>         exec &>/dev/null

>         $SSH -fnN2 -o PreferredAuthentications=publickey -i

> /var/lib/bacula/.ssh/id_dsa -l $USER -R 9103:$LOCAL:9103 $CLIENT >

> /dev/null 2> /dev/null

>         exit $?

>         ;;

>

>  stop)

>         # remove tunnel

>         echo "Stopping SSH-tunnel to $CLIENT..."

>         # find PID killem

>         PID=`ps ax | grep "/usr/bin/ssh" | grep

> "/var/lib/bacula/.ssh/id_dsa" | grep "$CLIENT" | awk '{ print $1 }'`

>         kill $PID

>         exit 0;

>         ;;

>  *)

>         #  usage:

>         echo "             "

>         echo "      Start SSH-tunnel to client-host"

>         echo "      to bacula-director and storage-daemon"

>         echo "            "

>         echo "      USAGE:"

>         echo "      ssh-tunnel.sh {start|stop} client.fqdn"

>         echo ""

>         exit 1

>         ;;

> esac

>

> ------------

>

> regards,

> michael

>

> -------------------------------------------------------------------------

> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference

> Don't miss this year's exciting event. There's still time to save $100.

> Use priority code J8TL2D2.

> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

> _______________________________________________

> Bacula-users mailing list

> Bacula-users AT lists.sourceforge DOT net

> https://lists.sourceforge.net/lists/listinfo/bacula-users

>

>

>

>

> -------------------------------------------------------------------------

> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge

> Build the coolest Linux based applications with Moblin SDK & win great prizes

> Grand prize is a trip for two to an Open Source event anywhere in the world

> http://moblin-contest.org/redirect.php?banner_id=100&url="">

> _______________________________________________

> Bacula-users mailing list

> Bacula-users AT lists.sourceforge DOT net

> https://lists.sourceforge.net/lists/listinfo/bacula-users

>

 

--

Arno Lehmann

IT-Service Lehmann

www.its-lehmann.de

 

-------------------------------------------------------------------------

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge

Build the coolest Linux based applications with Moblin SDK & win great prizes

Grand prize is a trip for two to an Open Source event anywhere in the world

http://moblin-contest.org/redirect.php?banner_id=100&url="">

_______________________________________________

Bacula-users mailing list

Bacula-users AT lists.sourceforge DOT net

https://lists.sourceforge.net/lists/listinfo/bacula-users

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users