Arno,
Thanks for your
reply. I did some more testing today on this; I tried your suggestions,
but was unsuccessful. I then modified my script (from ClientRunAfterJob)
to include an echo at the end. After the backup, that echo does come through.
So it's not the Alfresco init script that's hanging, it's backupComplete.bash.
From the job: ClientRunAfterJob =
"/etc/bacula/customScripts/backupComplete.bash"
The script: /etc/bacula/customScripts/backupComplete.bash:
#!/bin/bash
#
# This script fires after Alfresco backup is complete.
source /etc/profile
/etc/init.d/alfresco start
echo "This was echoed from the bacula script."
Bacula log:
20-Jul 21:20 ns2-sd: Volume "AAA017L2"
previously written, moving to end of data.
20-Jul 21:21 ns2-sd: Ready to append to end of Volume
"AAA017L2" at file=9.
20-Jul 21:32 bckm1-fd: ClientAfterJob: run command
"/etc/bacula/customScripts/backupComplete.bash"
20-Jul 21:32 bckm1-fd: ClientAfterJob: Starting
OpenOffice service ...
20-Jul 21:32 bckm1-fd: ClientAfterJob: Starting Alfresco
...
20-Jul 21:32 bckm1-fd: ClientAfterJob: This was echoed
from the bacula script.
20-Jul 21:51 ns2-dir: bckm1.2008-07-20_21.19.44 Fatal
error: Network error with FD during Backup: ERR=No data available
The last line is where I restart the Bacula client service
on bckm1-fd since the script does not complete. Any idea why this would
be hanging on my script?
·
We have used Bacula for a couple of years, and
have similar startup scripts on other servers that run without incident.
·
If I run backupComplete.bash from the command
line, it completes without error or hanging.
·
It only hangs when Bacula kicks off this script.
I haven’t found one posted, but is there some related
bug in the Bacula client we’re using? (2.0.3?)
There are only two versions of Bacula currently in Portage,
2.0.3, and 2.4.1. These were added just recently, so our server is
running 1.36.3 (not sure if it would even upgrade smoothly to the newer
versions).
What else should I try?
-----Original Message-----
From: bacula-users-bounces AT lists.sourceforge DOT net
[mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of Arno Lehmann
Sent: Thursday, July 17, 2008 16:43
To: bacula-users AT lists.sourceforge DOT net
Subject: Re: [Bacula-users] RunBeforeJob get stuck .. sometimes ..
Hi,
17.07.2008 20:43, Jeremy Koppel wrote:
> We are experiencing something similar backing up one
of our VMs. We're backing up an Alfresco instance; the ClientRunBeforeJob
script that shuts Alfresco down from the DB VM runs fine, but the
ClientRunAfterJob script that starts it back up from the repository VM produces
a zombie. All that's in the script right now is a call to the init
script: '/etc/init.d/alfresco start'.
>
> When I look at it in the morning, Bacula is still
'backing up' the repository.
>
> Here's the last few lines from last night's backup:
> 17-Jul 02:11 bckm1-fd:
ClientAfterJob: run command
"/etc/bacula/customScripts/backupComplete.bash"
> 17-Jul 02:11 bckm1-fd:
ClientAfterJob: Starting OpenOffice service ...
> 17-Jul 02:11 bckm1-fd:
ClientAfterJob: Starting Alfresco ...
>
> The last 2 are from the Alfresco init script.
The OS shows that the ClientRunAfterJob script is a zombie:
>
> # ps -Alf
> 0 Z root 5663
28588 0 78 0 - 0
exit 02:11 ? 00:00:00
> [backupComplete.] <defunct>
>
>
> Restarting Bacula on the client does take that
process with it, but of course marks the backup as an error. Any ideas
where I should be looking?
You could try to run the Alfresco init script
backgrounded
(/etc/init.d/alfresco start &) or even nohup'ed
(nohup
/etc/init.d/alfresco.start >
/var/log/alfresco-restart.out 2>&1 &) and
see if that at least lets Bacula continue.
I assume the init script somehow hangs, for example if it
doesn't have
a terminal. But that's something to debug separately...
Arno
>
> Our config:
> Director: Gentoo Linux 2.6.14-r5 (32-bit
i686), running Bacula 1.36.3
> Client: Gentoo Linux OpenVZ kernel
2.6.18-028stab053 (AMD64 build (for Intel Xeon quad-core)), running Bacula
client-only 2.0.3 (all versions for AMD64 marked in the testing branch in
Portage)
>
>
> --Jeremy
>
>
>
> -----Original Message-----
> From: bacula-users-bounces AT lists.sourceforge DOT net
[mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of Michael Patzer
> Sent: Wednesday, May 07, 2008 4:29
> To: bacula-users AT lists.sourceforge DOT net
> Subject: [Bacula-users] RunBeforeJob get stuck ..
sometimes ..
>
> hi,
>
> i use the following script to connect to some
dmz-servers in
> runbeforejob.
> if i run the job manually it works always, but if
the scheduler runs it
> at
> night, all the jobs with this script always get
stuck until i kill the
> "[ssh-tunnel.sh] <defunct>"
proecesses.
>
> after that the queued jobs for the same clients,
with the same
> runbeforejob,
> run fine.
>
> while it waits the tunnel itself opens fine, but it
looks like that the
> script
> doesn't exit becouse ssh waits for something...
>
> any ideas how to fix that?
>
> ------------
>
> #!/bin/sh
>
> # variables
> USER=bacula
> CLIENT=$2
> LOCAL=bla.domain.tld
> SSH=/usr/bin/ssh
>
> case "$1" in
> start)
> # create ssh-tunnel
> echo
"Starting SSH-tunnel to $CLIENT..."
> exec
&>/dev/null
> $SSH
-fnN2 -o PreferredAuthentications=publickey -i
> /var/lib/bacula/.ssh/id_dsa -l $USER -R
9103:$LOCAL:9103 $CLIENT >
> /dev/null 2> /dev/null
> exit
$?
> ;;
>
> stop)
> #
remove tunnel
> echo
"Stopping SSH-tunnel to $CLIENT..."
> #
find PID killem
>
PID=`ps ax | grep "/usr/bin/ssh" | grep
> "/var/lib/bacula/.ssh/id_dsa" | grep
"$CLIENT" | awk '{ print $1 }'`
> kill
$PID
> exit
0;
> ;;
> *)
>
# usage:
> echo
"
"
> echo
" Start SSH-tunnel to client-host"
> echo
" to bacula-director and
storage-daemon"
> echo
" "
> echo
" USAGE:"
> echo
" ssh-tunnel.sh {start|stop}
client.fqdn"
> echo
""
> exit
1
> ;;
> esac
>
> ------------
>
> regards,
> michael
>
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008
JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still
time to save $100.
> Use priority code J8TL2D2.
>
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
>
https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>
>
>
-------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your
Move Developer's challenge
> Build the coolest Linux based applications with
Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source
event anywhere in the world
>
http://moblin-contest.org/redirect.php?banner_id=100&url="">
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
>
https://lists.sourceforge.net/lists/listinfo/bacula-users
>
--
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move
Developer's challenge
Build the coolest Linux based applications with Moblin
SDK & win great prizes
Grand prize is a trip for two to an Open Source event
anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url="">
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users