Bacula-users

Re: [Bacula-users] [Bacula-devel] Bacula 3.0.3 deadlock : Job is waiting for execution

2010-01-08 18:05:21
Subject: Re: [Bacula-users] [Bacula-devel] Bacula 3.0.3 deadlock : Job is waiting for execution
From: Kern Sibbald <kern AT sibbald DOT com>
To: bacula-devel AT lists.sourceforge DOT net
Date: Sat, 9 Jan 2010 00:03:41 +0100
Hello Arno and Renaud,

I can believe that there might be a bug in the lock manager software, but I am 
very surprised that it is turned on. It should only be turned on for 
developers, and thus though this patch may be correct (I don't think so, but 
Eric can answer more definitively), it should never be needed in a production 
system, and won't work in a production system because of the lock manager 
being turned off.

Can you explain why the lock manager code is turned on?

If this is a problem with a misconfigured mail daemon, then it is very likely 
that this problem has already shown up and has a very different solution.  
The problem I just mentioned is fixed in the current development version, and 
the workaround for version 3.0.x is to ensure that either email is turned off 
or you point to a valid smtp server.

Regards,

Kern

On Friday 08 January 2010 21:32:18 Arno Lehmann wrote:
> Hello,
>
> this is just forwarding your mail to bacula-devel, where it's more
> likely to be picked up, looked at, and perhaps integrated into the
> code base :-)
>
> Cheers, and thanks for not only analyzing the problem, but also
> providing a possible fix!
>
> Arno
>
> 07.01.2010 16:34, Renaud Marquet wrote:
> > Hi,
> >
> > I'm using bacula 3.0.3 and the director's job queue was stuck after
> > running the first job. The others were waiting indefinitely for
> > execution. If the director was restarted, I could run only one job, and
> > so on.
> >
> > Googling around I found these 2 posts without satisfying anwsers :
> > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-l
> >ists-3/bacula-25/upgrade-to-3-0-3-job-is-waiting-for-execution-102156/
> > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-l
> >ists-3/bacula-25/job-is-waiting-for-execuition-101508/
> >
> > I then looked at the code and found there is a deadlock happening in
> > message handling.
> >
> > The problem is located in close_msg(JCR *) function in message.c. When
> > it encounters an error while sending an e-mail, it calls the macro Jmsg1
> > (line 485) to report it. This macro calls dispatch_message, which tries
> > to acquire fides_mutex (line 738). Unfortunatly, this mutex was already
> > acquired in close_msg (line 431), thus resulting in a deadlock (as
> > stated in mutex documentation for PTHREAD_MUTEX_INITIALIZER kind).
> >
> > This problem was affecting me because mail daemon was not properly
> > configured on my server.
> >
> > It could be interesting to review these parts of the code to avoid such
> > situation.
> >
> > However I wrote a quick patch for lockmgr.c which simply upgrades
> > mutexes to PTHREAD_MUTEX_ERRORCHECK_NP kind and resolves this error.
> >
> > Hope this would help someone,
> > Renaud
> >
> > patch :
> >
> > diff -rupN bacula-3.0.3.vanilla/src/lib/lockmgr.c
> > bacula-3.0.3.patched/src/lib/lockmgr.c
> > --- bacula-3.0.3.vanilla/src/lib/lockmgr.c  2009-10-18 11:10:16.000000000
> > +0200
> > +++ bacula-3.0.3.patched/src/lib/lockmgr.c  2009-12-31 18:05:59.000000000
> > +0100
> > @@ -616,6 +616,15 @@ void lmgr_cleanup_main()
> >   */
> >  int lmgr_mutex_lock(pthread_mutex_t *m, const char *file, int line)
> >  {
> > +   /* Patch to avoid deadlock if mutex is locked more than once */
> > +   /* There's some performance hit which makes it probably not
> > acceptable */
> > +   /* for large system usage. */
> > +   if(*m == PTHREAD_MUTEX_INITIALIZER) {
> > +      pthread_mutexattr_t attr;
> > +      pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_ERRORCHECK_NP );
> > +      pthread_mutex_init( m, &attr );
> > +   }
> > +
> >     int ret;
> >     lmgr_thread_t *self = lmgr_get_thread_info();
> >     self->pre_P(m, file, line);
> >
> >
> >
> > -------------------------------------------------------------------------
> >----- This SF.Net email is sponsored by the Verizon Developer Community
> > Take advantage of Verizon's best-in-class app development support A
> > streamlined, 14 day to market process makes app distribution fast and
> > easy Join now and get one step closer to millions of Verizon customers
> > http://p.sf.net/sfu/verizon-dev2dev
> > _______________________________________________
> > Bacula-users mailing list
> > Bacula-users AT lists.sourceforge DOT net
> > https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users