Bacula-users

Re: [Bacula-users] [Bacula-devel] Bacula 3.0.3 deadlock : Job is waiting for execution

2010-01-12 10:31:16
Subject: Re: [Bacula-users] [Bacula-devel] Bacula 3.0.3 deadlock : Job is waiting for execution
From: Carlo Filippetto <carlo.filippetto AT gmail DOT com>
To: Kern Sibbald <kern AT sibbald DOT com>
Date: Tue, 12 Jan 2010 16:02:03 +0100
Thank's
problem solved



2010/1/12 Kern Sibbald <kern AT sibbald DOT com>
On Tuesday 12 January 2010 12:26:37 Carlo Filippetto wrote:
> Have someone found any other definitive solution?
> In this moment I had to disable the mail

The failure does not occur if you have a valid SMTP server defined.

>
> Thank's
>
>
>
> 2010/1/9 Renaud Marquet <rmarquet AT gmail DOT com>
>
> > Le samedi 09 janvier 2010 à 21:25 +0100, Kern Sibbald a écrit :
> > > Hello,
> > >
> > > On Saturday 09 January 2010 20:20:01 Renaud Marquet wrote:
> > > > Kern,
> > > >
> > > > altough I searched for a possible workaround, I didn't found the ones
> > > > you talk about. But your statement is not correct as pointing to a
> >
> > valid
> >
> > > > smtp server is not a proper workaround. Actually, if for some reason,
> > > > the *valid* smtp server is down, the problem will occur and I bet
> > > > users will not figure out the reason.
> > >
> > > I never claimed that my suggestion was a "proper" workaround nor that
> > > it
> >
> > was a
> >
> > > fix.  It is a workaround.
> >
> > Nevermind then ;)
> >
> > > If you want, you can backport the fixes (applied 23 October 2009), but
> >
> > since
> >
> > > we are close to release, and we have a workaround, we are not planning
> > > to backport them.
> >
> > No need to backport. This is not a 'blocker' problem, I just mailed here
> > in case someone else run into the same problem because there wasn't any
> > answer when googling. Bacula now runs perfectly fine on my system, so I
> > can wait for the upcoming release without any trouble.
> >
> > > > That's why I came up with this patch. It correctly fixes the problem
> >
> > but
> >
> > > > I recognize this could affect performances so it should certainly not
> >
> > be
> >
> > > > put in the trunk. It will even probably be useless as you pointed out
> > > > it's already fixed in developpement version.
> > >
> > > Unfortunately your patch does not fix the problem -- it masks the
> >
> > problem.  I
> >
> > > didn't look at your patch in detail, but I believe that it will make
> > > all locks recursive, which is not really what we want and may lead to
> > > some surprises.
> > >
> > > Bacula does have recursive locks, but we use them only in situations
> >
> > where
> >
> > > they need to be used and they are portable.  I am not so much worried
> >
> > about
> >
> > > the performance consequences of your patch, but your code is Linux only
> >
> > if I
> >
> > > am not mistaken (i.e. not portable), and as I said, the lock manager is
> >
> > not
> >
> > > production code.  It is development should only be turned on for
> >
> > developer's
> >
> > > for debugging.
> >
> > As I said in another mail, I didn't do anything to activate this lock
> > manager, so I guess it's not. I think the confusion come from the fact
> > mutexes are handled through some functions in lockmgr.c (through a
> > macro), I think even with lock manager deactivated.
> >
> > > > That said, I didn't know lock manager should be turned off in
> >
> > production
> >
> > > > environment. Moreover, I'm not sure I understand your point because,
> > > > although I didn't read all the code, it seems pretty strange to me
> > > > that a multithreaded application should not use any mutexes in a
> > > > production environment.
> > >
> > > We use mutexes in production as in development.  The lock manager
> >
> > "watches"
> >
> > > our lock usage and blows up Bacula if it detects a problem (deadlock,
> > > out
> >
> > of
> >
> > > order locks, ...).  It is a debug tool and not meant or sufficently
> >
> > tested
> >
> > > for production use.  Use it at your own risk.
> > >
> > > That said, you were very clever to figure out the problem. Not many
> > > users could do so.
> >
> > Thank you,
> > Regards.
> >
> > > Regards,
> > >
> > > Kern
> > >
> > > > Regards,
> > > > Renaud
> > > >
> > > > Le samedi 09 janvier 2010 à 00:03 +0100, Kern Sibbald a écrit :
> > > > > Hello Arno and Renaud,
> > > > >
> > > > > I can believe that there might be a bug in the lock manager
> > > > > software,
> >
> > but
> >
> > > > > I am very surprised that it is turned on. It should only be turned
> > > > > on
> >
> > for
> >
> > > > > developers, and thus though this patch may be correct (I don't
> > > > > think
> >
> > so,
> >
> > > > > but Eric can answer more definitively), it should never be needed
> > > > > in
> >
> > a
> >
> > > > > production system, and won't work in a production system because of
> >
> > the
> >
> > > > > lock manager being turned off.
> > > > >
> > > > > Can you explain why the lock manager code is turned on?
> > > > >
> > > > > If this is a problem with a misconfigured mail daemon, then it is
> >
> > very
> >
> > > > > likely that this problem has already shown up and has a very
> >
> > different
> >
> > > > > solution. The problem I just mentioned is fixed in the current
> > > > > development version, and the workaround for version 3.0.x is to
> >
> > ensure
> >
> > > > > that either email is turned off or you point to a valid smtp
> > > > > server.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Kern
> > > > >
> > > > > On Friday 08 January 2010 21:32:18 Arno Lehmann wrote:
> > > > > > Hello,
> > > > > >
> > > > > > this is just forwarding your mail to bacula-devel, where it's
> > > > > > more likely to be picked up, looked at, and perhaps integrated
> > > > > > into the code base :-)
> > > > > >
> > > > > > Cheers, and thanks for not only analyzing the problem, but also
> > > > > > providing a possible fix!
> > > > > >
> > > > > > Arno
> > > > > >
> > > > > > 07.01.2010 16:34, Renaud Marquet wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm using bacula 3.0.3 and the director's job queue was stuck
> >
> > after
> >
> > > > > > > running the first job. The others were waiting indefinitely for
> > > > > > > execution. If the director was restarted, I could run only one
> >
> > job,
> >
> > > > > > > and so on.
> > > > > > >
> > > > > > > Googling around I found these 2 posts without satisfying
> > > > > > > anwsers
> >
> > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-maili
> >
> > > > > > >ng-l
> >
> > ists-3/bacula-25/upgrade-to-3-0-3-job-is-waiting-for-execution-102156
> >
> > > > > > >/
> >
> > http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-maili
> >
> > > > > > >ng-l ists-3/bacula-25/job-is-waiting-for-execuition-101508/
> > > > > > >
> > > > > > > I then looked at the code and found there is a deadlock
> > > > > > > happening
> >
> > in
> >
> > > > > > > message handling.
> > > > > > >
> > > > > > > The problem is located in close_msg(JCR *) function in
> > > > > > > message.c. When it encounters an error while sending an e-mail,
> > > > > > > it calls the macro Jmsg1 (line 485) to report it. This macro
> > > > > > > calls
> > > > > > > dispatch_message, which tries to acquire fides_mutex (line
> > > > > > > 738). Unfortunatly, this mutex was already acquired in
> > > > > > > close_msg (line 431), thus resulting in a deadlock (as stated
> > > > > > > in mutex
> >
> > documentation
> >
> > > > > > > for PTHREAD_MUTEX_INITIALIZER kind).
> > > > > > >
> > > > > > > This problem was affecting me because mail daemon was not
> >
> > properly
> >
> > > > > > > configured on my server.
> > > > > > >
> > > > > > > It could be interesting to review these parts of the code to
> >
> > avoid
> >
> > > > > > > such situation.
> > > > > > >
> > > > > > > However I wrote a quick patch for lockmgr.c which simply
> > > > > > > upgrades mutexes to PTHREAD_MUTEX_ERRORCHECK_NP kind and
> > > > > > > resolves this
> >
> > error.
> >
> > > > > > > Hope this would help someone,
> > > > > > > Renaud
> > > > > > >
> > > > > > > patch :
> > > > > > >
> > > > > > > diff -rupN bacula-3.0.3.vanilla/src/lib/lockmgr.c
> > > > > > > bacula-3.0.3.patched/src/lib/lockmgr.c
> > > > > > > --- bacula-3.0.3.vanilla/src/lib/lockmgr.c    2009-10-18
> > > > > > > 11:10:16.000000000 +0200
> > > > > > > +++ bacula-3.0.3.patched/src/lib/lockmgr.c    2009-12-31
> > > > > > > 18:05:59.000000000 +0100
> > > > > > > @@ -616,6 +616,15 @@ void lmgr_cleanup_main()
> > > > > > >   */
> > > > > > >  int lmgr_mutex_lock(pthread_mutex_t *m, const char *file, int
> >
> > line)
> >
> > > > > > >  {
> > > > > > > +   /* Patch to avoid deadlock if mutex is locked more than
> > > > > > > once
> >
> > */
> >
> > > > > > > +   /* There's some performance hit which makes it probably not
> > > > > > > acceptable */
> > > > > > > +   /* for large system usage. */
> > > > > > > +   if(*m == PTHREAD_MUTEX_INITIALIZER) {
> > > > > > > +      pthread_mutexattr_t attr;
> > > > > > > +      pthread_mutexattr_settype( &attr,
> >
> > PTHREAD_MUTEX_ERRORCHECK_NP
> >
> > > > > > > ); +      pthread_mutex_init( m, &attr );
> > > > > > > +   }
> > > > > > > +
> > > > > > >     int ret;
> > > > > > >     lmgr_thread_t *self = lmgr_get_thread_info();
> > > > > > >     self->pre_P(m, file, line);
> >
> > ---------------------------------------------------------------------
> >
> > > > > > >---- ----- This SF.Net email is sponsored by the Verizon
> > > > > > > Developer Community Take advantage of Verizon's best-in-class
> > > > > > > app
> >
> > development
> >
> > > > > > > support A streamlined, 14 day to market process makes app
> > > > > > > distribution fast and easy Join now and get one step closer to
> > > > > > > millions of Verizon customers
> >
> > http://p.sf.net/sfu/verizon-dev2dev
> >
> > > > > > > _______________________________________________
> > > > > > > Bacula-users mailing list
> > > > > > > Bacula-users AT lists.sourceforge DOT net
> > > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users
> >
> > -------------------------------------------------------------------------
> >
> > > > >----- This SF.Net email is sponsored by the Verizon Developer
> >
> > Community
> >
> > > > > Take advantage of Verizon's best-in-class app development support A
> > > > > streamlined, 14 day to market process makes app distribution fast
> > > > > and easy Join now and get one step closer to millions of Verizon
> >
> > customers
> >
> > > > > http://p.sf.net/sfu/verizon-dev2dev
> > > > > _______________________________________________
> > > > > Bacula-users mailing list
> > > > > Bacula-users AT lists.sourceforge DOT net
> > > > > https://lists.sourceforge.net/lists/listinfo/bacula-users
> >
> > -------------------------------------------------------------------------
> >----- This SF.Net email is sponsored by the Verizon Developer Community
> > Take advantage of Verizon's best-in-class app development support A
> > streamlined, 14 day to market process makes app distribution fast and
> > easy
> > Join now and get one step closer to millions of Verizon customers
> > http://p.sf.net/sfu/verizon-dev2dev
> > _______________________________________________
> > Bacula-users mailing list
> > Bacula-users AT lists.sourceforge DOT net
> > https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users