We are experiencing something similar backing up one of our VMs. We're backing
up an Alfresco instance; the ClientRunBeforeJob script that shuts Alfresco down
from the DB VM runs fine, but the ClientRunAfterJob script that starts it back
up from the repository VM produces a zombie. All that's in the script right
now is a call to the init script: '/etc/init.d/alfresco start'.
When I look at it in the morning, Bacula is still 'backing up' the repository.
Here's the last few lines from last night's backup:
17-Jul 02:11 bckm1-fd: ClientAfterJob: run command
"/etc/bacula/customScripts/backupComplete.bash"
17-Jul 02:11 bckm1-fd: ClientAfterJob: Starting OpenOffice service ...
17-Jul 02:11 bckm1-fd: ClientAfterJob: Starting Alfresco ...
The last 2 are from the Alfresco init script. The OS shows that the
ClientRunAfterJob script is a zombie:
# ps -Alf
0 Z root 5663 28588 0 78 0 - 0 exit 02:11 ? 00:00:00
[backupComplete.] <defunct>
Restarting Bacula on the client does take that process with it, but of course
marks the backup as an error. Any ideas where I should be looking?
Our config:
Director: Gentoo Linux 2.6.14-r5 (32-bit i686), running Bacula 1.36.3
Client: Gentoo Linux OpenVZ kernel 2.6.18-028stab053 (AMD64 build (for Intel
Xeon quad-core)), running Bacula client-only 2.0.3 (all versions for AMD64
marked in the testing branch in Portage)
--Jeremy
-----Original Message-----
From: bacula-users-bounces AT lists.sourceforge DOT net
[mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of Michael
Patzer
Sent: Wednesday, May 07, 2008 4:29
To: bacula-users AT lists.sourceforge DOT net
Subject: [Bacula-users] RunBeforeJob get stuck .. sometimes ..
hi,
i use the following script to connect to some dmz-servers in
runbeforejob.
if i run the job manually it works always, but if the scheduler runs it
at
night, all the jobs with this script always get stuck until i kill the
"[ssh-tunnel.sh] <defunct>" proecesses.
after that the queued jobs for the same clients, with the same
runbeforejob,
run fine.
while it waits the tunnel itself opens fine, but it looks like that the
script
doesn't exit becouse ssh waits for something...
any ideas how to fix that?
------------
#!/bin/sh
# variables
USER=bacula
CLIENT=$2
LOCAL=bla.domain.tld
SSH=/usr/bin/ssh
case "$1" in
start)
# create ssh-tunnel
echo "Starting SSH-tunnel to $CLIENT..."
exec &>/dev/null
$SSH -fnN2 -o PreferredAuthentications=publickey -i
/var/lib/bacula/.ssh/id_dsa -l $USER -R 9103:$LOCAL:9103 $CLIENT >
/dev/null 2> /dev/null
exit $?
;;
stop)
# remove tunnel
echo "Stopping SSH-tunnel to $CLIENT..."
# find PID killem
PID=`ps ax | grep "/usr/bin/ssh" | grep
"/var/lib/bacula/.ssh/id_dsa" | grep "$CLIENT" | awk '{ print $1 }'`
kill $PID
exit 0;
;;
*)
# usage:
echo " "
echo " Start SSH-tunnel to client-host"
echo " to bacula-director and storage-daemon"
echo " "
echo " USAGE:"
echo " ssh-tunnel.sh {start|stop} client.fqdn"
echo ""
exit 1
;;
esac
------------
regards,
michael
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|