Bacula-users

Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.

2009-12-18 06:51:37
Subject: Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.
From: Bruno Friedmann <bruno AT ioda-net DOT ch>
Date: Fri, 18 Dec 2009 12:47:37 +0100
On 12/18/2009 09:20 AM, Janusz Syrytczyk wrote:
> On Monday 14 December 2009 09:27:40 Bruno Friedmann wrote:
>> On 12/09/2009 12:11 AM, Janusz Syrytczyk wrote:
>>> Hi,
>>>
>>> I've upgraded to 3.0.3  from 3.0.2 a while ago and I'm facing serious
>>> problems with bacula-dir stability.
>>>
>>> Just after its start,  Director is able to perform any request I have
>>> (perform a backup, restore, reload etc.). But once I've got the task
>>> done, Director stops listening me - the second job is not starting when
>>> requested. Then bconsole stops, I have to exit ctrl+c, but reissuing
>>> bconsle and here typing status dir gives that the backup is running.
>>>
>>> The problem is that the backup is not running. Director keeps it almost
>>> fully silent. When I try to reload through bconsole, I'm experiencing
>>> Director going like zombie - cannot connect. Debugging gives only this:
>>>
>>> atom-dir: bnet.c:670-0 who=client host=192.168.1.150 port=36131
>>>
>>> What's interesting, when I leave the Director alone it works OK, it
>>> schedules backups and performs them. I had previously suspected that
>>> something is wrong with scheduler as on before this troubleshooting I
>>> couldn't even get the Director scheduling, but since few days it goes
>>> right.
>>>
>>> This is the same issue as the guy here, but he hasn't found a clue:
>>>
>>> http://www.mail-archive.com/bacula-users AT lists.sourceforge DOT 
>>> net/msg38279.h
>>> tml
>>>
>>> I've just moved backups and database, recompiled Bacula, recreated the
>>> database and started backups  but the same history goes. What this could
>>> be, anyone?
>>
>> Don't know if it's your case.
>>
>> We have same trouble here with dir hanging after having run the first job.
>> I've restart it with -d100 just to check what's happen.
>> In the meantime, on the bacula server (which has been upgraded from
>>  opensuse 11.1 to 11.2 ) I have found that postfix is throttling ... (
>>  missing relay.db file in /etc/postfix : issue a postmap relay and restart
>>  postfix ) After that all emails are working.
>>
>> As inside my dir-config message bsmtp are connected to the internal
>>  postfix, bsmtp was hanging ! And perharps bacula-dir too.
>>
>> I've now running three scheduled jobs, and bacula-dir have done it's jobs.
>>
>> What I suspect is : there's no bsmtp timeout ( if it could not connect it
>>  return, but if it connect and nothing goes right in postfix (the
>>  throttling case) it wait indefinitely and also the director ....
>>
>> I will leave this configuration running 2 to 3 days just to be sure it was
>>  that.
>>
>> In the meantime, if you can check on your side, if you get some trouble
>>  with bstmp to infirm or confirm.
>>
> True, I've verified this too.
> 
> bsmtp goes zombie and bacula-dir waits on it. Solution is to usea another app 
> for sending email or drop email notifications at all.
> 
> I wonder if its not a candidate to bug report?
> 
> Thanks,
> JS
> 

I think you could fill a bug report against it
(forward the number here so I could attach myself to it)
In fact director or bsmtp need somewhere a timeout in case of such trap.



-- 

     Bruno Friedmann


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>