Bacula-users

Re: [Bacula-users] RunBeforeJob get stuck .. sometimes ..

2008-07-17 16:43:25
Subject: Re: [Bacula-users] RunBeforeJob get stuck .. sometimes ..
From: Arno Lehmann <al AT its-lehmann DOT de>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Thu, 17 Jul 2008 22:42:51 +0200
Hi,

17.07.2008 20:43, Jeremy Koppel wrote:
> We are experiencing something similar backing up one of our VMs.  We're 
> backing up an Alfresco instance; the ClientRunBeforeJob script that shuts 
> Alfresco down from the DB VM runs fine, but the ClientRunAfterJob script that 
> starts it back up from the repository VM produces a zombie.  All that's in 
> the script right now is a call to the init script:  '/etc/init.d/alfresco 
> start'.
> 
> When I look at it in the morning, Bacula is still 'backing up' the repository.
> 
> Here's the last few lines from last night's backup:
>    17-Jul 02:11 bckm1-fd: ClientAfterJob: run command 
> "/etc/bacula/customScripts/backupComplete.bash"
>    17-Jul 02:11 bckm1-fd: ClientAfterJob: Starting OpenOffice service ...
>    17-Jul 02:11 bckm1-fd: ClientAfterJob: Starting Alfresco ...
> 
> The last 2 are from the Alfresco init script.  The OS shows that the 
> ClientRunAfterJob script is a zombie:
> 
> # ps -Alf
> 0 Z root      5663 28588  0  78   0 -     0 exit   02:11 ?        00:00:00
> [backupComplete.] <defunct>
> 
> 
> Restarting Bacula on the client does take that process with it, but of course 
> marks the backup as an error.  Any ideas where I should be looking?

You could try to run the Alfresco init script backgrounded 
(/etc/init.d/alfresco start &) or even nohup'ed (nohup 
/etc/init.d/alfresco.start > /var/log/alfresco-restart.out 2>&1 &) and 
see if that at least lets Bacula continue.

I assume the init script somehow hangs, for example if it doesn't have 
a terminal. But that's something to debug separately...

Arno

> 
> Our config:
> Director:  Gentoo Linux 2.6.14-r5 (32-bit i686), running Bacula 1.36.3
> Client:  Gentoo Linux OpenVZ kernel 2.6.18-028stab053 (AMD64 build (for Intel 
> Xeon quad-core)), running Bacula client-only 2.0.3 (all versions for AMD64 
> marked in the testing branch in Portage)
> 
> 
> --Jeremy
> 
> 
> 
> -----Original Message-----
> From: bacula-users-bounces AT lists.sourceforge DOT net 
> [mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of 
> Michael Patzer
> Sent: Wednesday, May 07, 2008 4:29
> To: bacula-users AT lists.sourceforge DOT net
> Subject: [Bacula-users] RunBeforeJob get stuck .. sometimes ..
> 
> hi,
> 
> i use the following script to connect to some dmz-servers in
> runbeforejob. 
> if i run the job manually it works always, but if the scheduler runs it
> at 
> night, all the jobs with this script always get stuck until i kill the 
> "[ssh-tunnel.sh] <defunct>" proecesses.
> 
> after that the queued jobs for the same clients, with the same
> runbeforejob, 
> run fine.
> 
> while it waits the tunnel itself opens fine, but it looks like that the
> script 
> doesn't exit becouse ssh waits for something...
> 
> any ideas how to fix that?
> 
> ------------
> 
> #!/bin/sh
> 
> # variables
> USER=bacula
> CLIENT=$2
> LOCAL=bla.domain.tld
> SSH=/usr/bin/ssh
> 
> case "$1" in
>  start)
>     # create ssh-tunnel 
>         echo "Starting SSH-tunnel to $CLIENT..."
>         exec &>/dev/null
>         $SSH -fnN2 -o PreferredAuthentications=publickey -i
> /var/lib/bacula/.ssh/id_dsa -l $USER -R 9103:$LOCAL:9103 $CLIENT >
> /dev/null 2> /dev/null
>         exit $?
>         ;;
> 
>  stop)
>         # remove tunnel 
>         echo "Stopping SSH-tunnel to $CLIENT..."
>         # find PID killem
>         PID=`ps ax | grep "/usr/bin/ssh" | grep
> "/var/lib/bacula/.ssh/id_dsa" | grep "$CLIENT" | awk '{ print $1 }'`
>         kill $PID
>         exit 0;
>         ;;
>  *)
>         #  usage:
>         echo "             "
>         echo "      Start SSH-tunnel to client-host"
>         echo "      to bacula-director and storage-daemon"
>         echo "            "
>         echo "      USAGE:"
>         echo "      ssh-tunnel.sh {start|stop} client.fqdn"
>         echo ""
>         exit 1
>         ;;
> esac
> 
> ------------
> 
> regards,
> michael
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 
> 
> 
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users