Bacula-users

Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.

2010-02-24 11:37:05
Subject: Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.
From: le dahut <le.dahut AT laposte DOT net>
To: Janusz Syrytczyk <jsyrytczyk AT uni.opole DOT pl>
Date: Wed, 24 Feb 2010 17:33:41 +0100
Hello,

Janusz Syrytczyk wrote :
On Monday 14 December 2009 09:27:40 Bruno Friedmann wrote:
On 12/09/2009 12:11 AM, Janusz Syrytczyk wrote:
Hi,

I've upgraded to 3.0.3  from 3.0.2 a while ago and I'm facing serious
problems with bacula-dir stability.

Just after its start,  Director is able to perform any request I have
(perform a backup, restore, reload etc.). But once I've got the task
done, Director stops listening me - the second job is not starting when
requested. Then bconsole stops, I have to exit ctrl+c, but reissuing
bconsle and here typing status dir gives that the backup is running.

The problem is that the backup is not running. Director keeps it almost
fully silent. When I try to reload through bconsole, I'm experiencing
Director going like zombie - cannot connect. Debugging gives only this:

atom-dir: bnet.c:670-0 who=client host=192.168.1.150 port=36131

What's interesting, when I leave the Director alone it works OK, it
schedules backups and performs them. I had previously suspected that
something is wrong with scheduler as on before this troubleshooting I
couldn't even get the Director scheduling, but since few days it goes
right.

This is the same issue as the guy here, but he hasn't found a clue:

http://www.mail-archive.com/bacula-users AT lists.sourceforge DOT net/msg38279.h
tml

I've just moved backups and database, recompiled Bacula, recreated the
database and started backups  but the same history goes. What this could
be, anyone?
Don't know if it's your case.

We have same trouble here with dir hanging after having run the first job.
I've restart it with -d100 just to check what's happen.
In the meantime, on the bacula server (which has been upgraded from
 opensuse 11.1 to 11.2 ) I have found that postfix is throttling ... (
 missing relay.db file in /etc/postfix : issue a postmap relay and restart
 postfix ) After that all emails are working.

As inside my dir-config message bsmtp are connected to the internal
 postfix, bsmtp was hanging ! And perharps bacula-dir too.

I've now running three scheduled jobs, and bacula-dir have done it's jobs.

What I suspect is : there's no bsmtp timeout ( if it could not connect it
 return, but if it connect and nothing goes right in postfix (the
 throttling case) it wait indefinitely and also the director ....

I will leave this configuration running 2 to 3 days just to be sure it was
 that.

In the meantime, if you can check on your side, if you get some trouble
 with bstmp to infirm or confirm.

True, I've verified this too.

bsmtp goes zombie and bacula-dir waits on it. Solution is to usea another app for sending email or drop email notifications at all.

I wonder if its not a candidate to bug report?

Did you report the bug ?

I encounter the same problem with bacula-3.0.3, bsmtp goes zombie when it prints output (i.e. when it prints an error). I tested to call a bash script that simply does 'echo hello' and it goes zombie too so the problem seems to be bacula-dir, not bsmtp.

I've attached a script that disables any output from bsmtp. Call it with the same args as bsmtp. The -dt argument is not supported since it is not optparse compatible.


Attachment: sendmsg.tgz
Description: application/compressed

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload., le dahut <=