Hello,
this is just forwarding your mail to bacula-devel, where it's more
likely to be picked up, looked at, and perhaps integrated into the
code base :-)
Cheers, and thanks for not only analyzing the problem, but also
providing a possible fix!
Arno
07.01.2010 16:34, Renaud Marquet wrote:
> Hi,
>
> I'm using bacula 3.0.3 and the director's job queue was stuck after
> running the first job. The others were waiting indefinitely for
> execution. If the director was restarted, I could run only one job, and
> so on.
>
> Googling around I found these 2 posts without satisfying anwsers :
> http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/upgrade-to-3-0-3-job-is-waiting-for-execution-102156/
> http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/job-is-waiting-for-execuition-101508/
>
> I then looked at the code and found there is a deadlock happening in
> message handling.
>
> The problem is located in close_msg(JCR *) function in message.c. When
> it encounters an error while sending an e-mail, it calls the macro Jmsg1
> (line 485) to report it. This macro calls dispatch_message, which tries
> to acquire fides_mutex (line 738). Unfortunatly, this mutex was already
> acquired in close_msg (line 431), thus resulting in a deadlock (as
> stated in mutex documentation for PTHREAD_MUTEX_INITIALIZER kind).
>
> This problem was affecting me because mail daemon was not properly
> configured on my server.
>
> It could be interesting to review these parts of the code to avoid such
> situation.
>
> However I wrote a quick patch for lockmgr.c which simply upgrades
> mutexes to PTHREAD_MUTEX_ERRORCHECK_NP kind and resolves this error.
>
> Hope this would help someone,
> Renaud
>
> patch :
>
> diff -rupN bacula-3.0.3.vanilla/src/lib/lockmgr.c
> bacula-3.0.3.patched/src/lib/lockmgr.c
> --- bacula-3.0.3.vanilla/src/lib/lockmgr.c 2009-10-18 11:10:16.000000000
> +0200
> +++ bacula-3.0.3.patched/src/lib/lockmgr.c 2009-12-31 18:05:59.000000000
> +0100
> @@ -616,6 +616,15 @@ void lmgr_cleanup_main()
> */
> int lmgr_mutex_lock(pthread_mutex_t *m, const char *file, int line)
> {
> + /* Patch to avoid deadlock if mutex is locked more than once */
> + /* There's some performance hit which makes it probably not
> acceptable */
> + /* for large system usage. */
> + if(*m == PTHREAD_MUTEX_INITIALIZER) {
> + pthread_mutexattr_t attr;
> + pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_ERRORCHECK_NP );
> + pthread_mutex_init( m, &attr );
> + }
> +
> int ret;
> lmgr_thread_t *self = lmgr_get_thread_info();
> self->pre_P(m, file, line);
>
>
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
--
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|