You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Manuel Teira <mt...@tid.es> on 2008/06/05 19:31:22 UTC

Problems with Mutex initialization

Hello.

It seems that the dreaded Mutex initialization assert is not completely 
gone. After compiling the unit_test executable, it asserts while trying 
to run it, in qpid::sys::Mutex::Mutex(), trying to use a non initialized 
recursiveMutexAttr.

The abort stack indicates that it is happening in the 
qpid::sys::DeletionManager::AllThreadStatuses int constructor, I suppose 
that while creating the static allThreadStatuses object.

I suspect that it has something to do with static initialization order. 
So, I replaced the QPID_POSIX_ASSERT_THROW_IF wrapping the 
pthread_mutex_init call in qpid::sys::Mutex::Mutex(), to allow the 
program to continue, with:

  if (pthread_mutex_init(&mutex, recursiveMutexattr) != 0) {
    std::cout << "Error initializing mutex: " << &mutex
              << ". " << std::strerror(errno) << std::endl;
  }

I also added some logs in the initMutexattr function:
          std::cout << "initMutexattr[" << pthread_self()  << "](once: " 
<< &onceControl   << ", mutexattr: " << &mutexattr
                    << ")" << std::endl;
in the Mutex constructor and in the AllThreadStatuses constructor to 
verify that the mutex is being used before the recursiveMutexAttr gets 
initialized:

initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t 
at 0x0000000100400758
Error initializing mutex: 0xffffffff7e14e680. Error 0
AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680

[snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)


I suspect that the static nature of AllThreadStatuses could be involved. 
Any thought?

Best regards.
--
Manuel.




Re: Problems with Mutex initialization

Posted by Manuel Teira <mt...@tid.es>.
Alan Conway escribió:
> Manuel Teira wrote:
>   
>> Hello.
>>
>> It seems that the dreaded Mutex initialization assert is not completely
>> gone. After compiling the unit_test executable, it asserts while trying
>> to run it, in qpid::sys::Mutex::Mutex(), trying to use a non initialized
>> recursiveMutexAttr.
>>
>> The abort stack indicates that it is happening in the
>> qpid::sys::DeletionManager::AllThreadStatuses int constructor, I suppose
>> that while creating the static allThreadStatuses object.
>>     
>
> Can you post a stack trace and the perror message that went with the failure?
>
>   
Yes. Sorry for the delay, but I didn't save the core file previous to 
the change in the Mutex.h. And since this file is included from a lot of 
places I had to recompile again the whole project, that takes near 90 
minutes on the hardware I'm using. This is what I get now:

bash-3.00$ ./unit_test
initMutexattr[1](once: 0xffffffff7e14e100, mutexattr: 0xffffffff7e1533d0)
initMutexattr[1](once: 0xffffffff7e14e420, mutexattr: 0xffffffff7e1533e8)
Invalid argument
Assertion failed: 0, file ../../src/qpid/sys/posix/Mutex.h, line 138
Abort (core dumped)

bash-3.00$ pstack core | c++filt
core 'core' of 25632:   
/export/home/devel/ws/DSLAP/qpid/trunk/qpid/cpp/src/tests/.libs/unit_t
 ffffffff7cdca124 _lwp_kill (6, 0, ffffffff7cdad9d0, ffffffff7cee0000, 
ffffffffffffffff, 1800) + 8
 ffffffff7cd4620c abort (1, 1b8, 0, 199f00, 0, 0) + 118
 ffffffff7cd464b0 _assert (1002d51b1, 1002d51b3, 8a, 0, 199bb8, 0) + 74
 00000001000ee444 qpid::sys::Mutex::Mutex #Nvariant 1() 
(ffffffff7e14e480, ffffffff7e110938, 0, ffffffff7defe620, 
ffffffff7f604f40, 1) + bc
 ffffffff7df01c34 
qpid::sys::DeletionManager<qpid::sys::PollerHandle>::AllThreadsStatuses::AllThreadsStatuses 
#Nvariant 1(int) (ffffffff7e14e480, 0, ffffffff7d910300, 
8000000000000000, 1493e0, 80000000) + 24
 ffffffff7defe620 __SLIP.INIT_J (0, 0, 0, 0, 0, 0) + 28
 ffffffff7df01394 void __STATIC_CONSTRUCTOR() (0, ffffffff7ceeae40, 
199648, ffffffff7d3086b4, ffffffff7bb005c0, ffffffff7bb00600) + 4c
 ffffffff7df99958 _init (0, 0, ffffffff7f72cac0, ffffffff7f611144, 
11f368, ffffffff7bc02030) + 498
 ffffffff7f61114c call_init (ffffffff7f72f620, 1, ffffffff7df994c0, 
ffffffff7ef01198, ffdfffff, ffffffff7f72cac0) + 1b0
 ffffffff7f610614 setup (ffffffff7f72c2e8, ffffffff7f72fd90, 
ffffffff7f72cac0, 0, ffffffff7bd011b8, ffffffff7f72c1c0) + 1358
 ffffffff7f61ebb8 _setup (12e4b8, b00, 28e30, 100000040, 0, 
ffffffff7ffffbd8) + 3a4
 ffffffff7f604f10 _rt_boot (0, 0, 0, 0, 0, 0) + 88
 0000000000000000 ???????? (0, 0, 0, 0, 0, 0)

This is everything I was able to get, using the pstack utility on the 
generated core file, as dbx refuses to cooperate, just SIGSEGV itself 
trying to inspect this core file. I didn't try to load the dbx core 
inside dbx itself, fearing some relativity effect.

The error is an EINVAL, and trusting the previous logs, it's caused by 
the pthread_mutexattr_t not being initialized previously.

>> I suspect that it has something to do with static initialization order.
>> So, I replaced the QPID_POSIX_ASSERT_THROW_IF wrapping the
>> pthread_mutex_init call in qpid::sys::Mutex::Mutex(), to allow the
>> program to continue, with:
>>
>>  if (pthread_mutex_init(&mutex, recursiveMutexattr) != 0) {
>>    std::cout << "Error initializing mutex: " << &mutex
>>              << ". " << std::strerror(errno) << std::endl;
>>  }
>>
>> I also added some logs in the initMutexattr function:
>>          std::cout << "initMutexattr[" << pthread_self()  << "](once: "
>> << &onceControl   << ", mutexattr: " << &mutexattr
>>                    << ")" << std::endl;
>> in the Mutex constructor and in the AllThreadStatuses constructor to
>> verify that the mutex is being used before the recursiveMutexAttr gets
>> initialized:
>>
>> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
>> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
>> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t
>> at 0x0000000100400758
>> Error initializing mutex: 0xffffffff7e14e680. Error 0
>> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
>>
>> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
>> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
>>
>>
>> I suspect that the static nature of AllThreadStatuses could be involved.
>> Any thought?
>>
>>     
>
> Order of static initializers would be my favourite suspect also. I'll take a
> look when I have a stack trace.
> .
>
>   


Re: Problems with Mutex initialization

Posted by Alan Conway <ac...@redhat.com>.
Manuel Teira wrote:
> Hello.
> 
> It seems that the dreaded Mutex initialization assert is not completely 
> gone. After compiling the unit_test executable, it asserts while trying 
> to run it, in qpid::sys::Mutex::Mutex(), trying to use a non initialized 
> recursiveMutexAttr.
> 
> The abort stack indicates that it is happening in the 
> qpid::sys::DeletionManager::AllThreadStatuses int constructor, I suppose 
> that while creating the static allThreadStatuses object.

Can you post a stack trace and the perror message that went with the failure?

> I suspect that it has something to do with static initialization order. 
> So, I replaced the QPID_POSIX_ASSERT_THROW_IF wrapping the 
> pthread_mutex_init call in qpid::sys::Mutex::Mutex(), to allow the 
> program to continue, with:
> 
>  if (pthread_mutex_init(&mutex, recursiveMutexattr) != 0) {
>    std::cout << "Error initializing mutex: " << &mutex
>              << ". " << std::strerror(errno) << std::endl;
>  }
> 
> I also added some logs in the initMutexattr function:
>          std::cout << "initMutexattr[" << pthread_self()  << "](once: " 
> << &onceControl   << ", mutexattr: " << &mutexattr
>                    << ")" << std::endl;
> in the Mutex constructor and in the AllThreadStatuses constructor to 
> verify that the mutex is being used before the recursiveMutexAttr gets 
> initialized:
> 
> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t 
> at 0x0000000100400758
> Error initializing mutex: 0xffffffff7e14e680. Error 0
> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
> 
> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
> 
> 
> I suspect that the static nature of AllThreadStatuses could be involved. 
> Any thought?
> 

Order of static initializers would be my favourite suspect also. I'll take a 
look when I have a stack trace.

Re: Problems with Mutex initialization

Posted by Manuel Teira <mt...@tid.es>.
Andrew Stitcher escribió:
> On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
>   
>> ...
>> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
>> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
>> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t
>> at 0x0000000100400758
>> Error initializing mutex: 0xffffffff7e14e680. Error 0
>> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
>>
>> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
>> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
>>
>>
>> I suspect that the static nature of AllThreadStatuses could be involved.
>> Any thought?
>>     
>
> I'm pretty sure you are correct about the static initialisation order.
>
> One problem that your trace points out though is that we will get a
> _different_ initialisation for each file including Mutex.h due to the
> anonymous namespace used here - this wasn't the intention of this code.
> I guess this needs to be fixed by creating a Mutex.cpp to hold this
> anonymous namespace. However I'm not sure whether or not this is the
> cause of your problem,although it can't help.
>   
Yes. My first thought was that the initMutexattr object was intended as 
a singleton, hence shared by all the Mutex instances in the process and 
saving memory and CPU. But since it goes in the .h file, every 
compilation unit is inlining its own initMutexattr.

> Andrew
>
>
>
>   


Re: Problems with Mutex initialization

Posted by Manuel Teira <mt...@tid.es>.
Alan Conway escribió:
> Andrew Stitcher wrote:
>   
>> On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
>>     
>>> ...
>>> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
>>> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
>>> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t
>>> at 0x0000000100400758
>>> Error initializing mutex: 0xffffffff7e14e680. Error 0
>>> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
>>>
>>> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
>>> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
>>>
>>>
>>> I suspect that the static nature of AllThreadStatuses could be involved.
>>> Any thought?
>>>       
>> I'm pretty sure you are correct about the static initialisation order.
>>
>> One problem that your trace points out though is that we will get a
>> _different_ initialisation for each file including Mutex.h due to the
>> anonymous namespace used here - this wasn't the intention of this code.
>> I guess this needs to be fixed by creating a Mutex.cpp to hold this
>> anonymous namespace. However I'm not sure whether or not this is the
>> cause of your problem,although it can't help.
>>
>>     
>
> I think I may see the problem. Is all AllThreadsStatuses called (directly or
> indirectly) from static constructors in other compilation units?
>   
AllThreadStatuses is initialized in the static zone of the poller 
implementations, both EPollPoller and ECFPoller. I think this is the 
only place it's used, but I'm not completely sure.
> An object that is called from static constructors in other compilation units has
> to be written in a rather special way - it has to be a POD and be POD
> initialized - i.e. no constructors or destructors and no members/bases with
> ctors/dtors.
>   
> We have a PODMutex for exactly that situation but if AllThreadStatuses is indeed
> being called from other compilation units during static init then it needs a
> redesign - both the Mutex and std::vector members have constructors. It may need
> to be rewritten as a singleton.
>
> Andrew do you know the circumstances under which this is called? Manuel - can
> you post a stack trace? That should answer the question.
>
> Cheers,
> Alan.
> .
>
>   


Re: Problems with Mutex initialization

Posted by Alan Conway <ac...@redhat.com>.
On Fri, 2008-06-27 at 14:05 +0200, Manuel Teira wrote:
> Alan Conway escribió:
> > On Wed, 2008-06-25 at 11:12 +0200, Manuel Teira wrote:
> >   
> >> Hello. After a pair of weeks dealing with random, uninteresting stuff,
> >> I'm back to the solaris port effort. I would like to wake up this issue
> >> again, since it's a stopper for the solaris testing I'm trying to do.
> >>
> >> After all this discussion, what seems to be true, is:
> >> 1.-The static initialization order is not the desired one.
> >> 2.-Inlining the Mutex.h annonymous code in every Mutex usage, produces
> >> multiple instances of onceControl and mutexattr, whereas the intended
> >> behaviour was to have single instances, to be shared across the code.
> >> 3.-About the PODMutex discussion, it is not needed here, since the
> >> pthread_once_t usage is giving us the desired "one-shot" behaviour.
> >>
> >> It was also said, that fixing #2 could lead to get #1 fixed. Anyway,
> >> what is the recommended technique to "force" static initialization if
> >> having #2 fixed is not helping?
> >>
> >>     
> >
> > Just committed a fix that removes the need for a global object with a
> > constructor so should fix any init order problems. Also took this code
> > out of line so it's genuinely once, not once-per-compilation-unit. Shout
> > if it doesn't fix the problem or creates any new ones.
> >   
> Great, it got fixed!!
> 
> Most of the tests are passing now, but running the federation tests the 
> old dynamic_cast monster related with the Sun compiler bug arises again. 
> The problem was due to AsynchConnector being implemented in terms of 
> DispatchHandle, so, I had to change that inheritance to be public also.
> 
> I would like to show you how I've handled this workaround, to see if 
> this is an acceptable way to do it:
> 
> class AsynchConnector :
> //Bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6710638
> //Should be private inheritance
> #ifdef __SUNPRO_CC
>     public
> #else
>     private
> #endif
>     DispatchHandle {
> public:
> ...
> 
> The same for AsynchIO. This way, we will only use that inheritance when 
> using the Sun Studio compiler, at least while that bug doesn't get 
> fixed.  Perhaps we should vote for it to get fixed. ;-)
> 
> Also, I would like to know what is the state of the JIRA issues I 
> submitted for the solaris port (QPID-1132 and QPID-1133), I mean, did 
> you have time to look at them, are there any problem with them,...?
> 
> I have another bunch of changes, most related with inclusion 
> differences. Perhaps they could be added without wrapping them in a 
> #ifdef clause, but I'm not sure. My plan is to open a new JIRA for those 
> changes, and expose them to your consideration. What do you think? (I'm 
> talking about this kind of modifications) :
> 
> -bash-3.00$ svn diff ../../src/qpid/sys/posix/AsynchIO.cpp
> Index: ../../src/qpid/sys/posix/AsynchIO.cpp
> ===================================================================
> --- ../../src/qpid/sys/posix/AsynchIO.cpp       (revision 671815)
> +++ ../../src/qpid/sys/posix/AsynchIO.cpp       (working copy)
> @@ -19,6 +19,10 @@
>   *
>   */
>  
> +#ifdef HAVE_CONFIG_H
> +#include "config.h"
> +#endif
> +
>  #include "qpid/sys/AsynchIO.h"
>  #include "qpid/sys/Time.h"
>  
> @@ -31,6 +35,11 @@
>  #include <signal.h>
>  #include <errno.h>
>  
> +#ifdef SUNOS
> +//strerror, memmove prototypes
> +#include <string.h>
> +#endif
> +
>  #include <boost/bind.hpp>
>  
>  using namespace qpid::sys;
> 
> 
> Do you agree with this way to handle it, or should we just include 
> string.h directly? Perhaps somebody should test all these changes on a 
> linux box, before deciding to add them with or without the #ifdef wrapping.

I would suggest including them directly. There are many places where we
are picking up system headers we need "by accident" because they are
included in other #include files. Making all the required #includes
explicit is good for portability.

I can commit these changes Monday, if I do stumble across any headers
that are not available on linux that is a good thing as it may highlight
a point were we need some more abstractions in qpid::sys.




Re: Problems with Mutex initialization

Posted by Alan Conway <ac...@redhat.com>.
On Fri, 2008-06-27 at 11:28 -0400, Andrew Stitcher wrote:
> On Fri, 2008-06-27 at 14:05 +0200, Manuel Teira wrote:
> > ...
> > class AsynchConnector :
> > //Bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6710638
> > //Should be private inheritance
> > #ifdef __SUNPRO_CC
> >     public
> > #else
> >     private
> > #endif
> >     DispatchHandle {
> > public:
> > ...
> > 
> 
> I must say I find this terribly ugly, but I suppose it does point out
> the cause of the change.

It is ugly but I can't think of any less-ugly fix that preserves the
private inheritance on non-Sun platforms. 

> So take out the #include "config.h" stuff and just include "string.h"
> directly. At very worst it can't cause problems on Linux.
> 
> There are very few cases now where you need to do conditional includes
> along these lines.

And in those cases the conditional includes should be isolated in a
qpid/sys header file defining an abstraction that can be implemented
differently on different platforms.



Re: Problems with Mutex initialization

Posted by Andrew Stitcher <as...@redhat.com>.
On Fri, 2008-06-27 at 14:05 +0200, Manuel Teira wrote:
> ...
> class AsynchConnector :
> //Bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6710638
> //Should be private inheritance
> #ifdef __SUNPRO_CC
>     public
> #else
>     private
> #endif
>     DispatchHandle {
> public:
> ...
> 

I must say I find this terribly ugly, but I suppose it does point out
the cause of the change.

> I have another bunch of changes, most related with inclusion 
> differences. Perhaps they could be added without wrapping them in a 
> #ifdef clause, but I'm not sure. My plan is to open a new JIRA for those 
> changes, and expose them to your consideration. What do you think? (I'm 
> talking about this kind of modifications) :
> 
> -bash-3.00$ svn diff ../../src/qpid/sys/posix/AsynchIO.cpp
> Index: ../../src/qpid/sys/posix/AsynchIO.cpp
> ===================================================================
> --- ../../src/qpid/sys/posix/AsynchIO.cpp       (revision 671815)
> +++ ../../src/qpid/sys/posix/AsynchIO.cpp       (working copy)
> @@ -19,6 +19,10 @@
>   *
>   */
>  
> +#ifdef HAVE_CONFIG_H
> +#include "config.h"
> +#endif
> +
>  #include "qpid/sys/AsynchIO.h"
>  #include "qpid/sys/Time.h"
>  
> @@ -31,6 +35,11 @@
>  #include <signal.h>
>  #include <errno.h>
>  
> +#ifdef SUNOS
> +//strerror, memmove prototypes
> +#include <string.h>
> +#endif
> +
>  #include <boost/bind.hpp>

I really think this kind of change is unnecessary and misguided. There
is actually no platform portability issue here. Simply, the includes are
incorrect.

It is easy to do if you (as I often do myself) just add #include lines
in response to compile errors. However the correct thing to do (not
often done) is to add include lines when the standard says they are
necessary and not rely on some random include file you included to get
definitions for you.

So take out the #include "config.h" stuff and just include "string.h"
directly. At very worst it can't cause problems on Linux.

There are very few cases now where you need to do conditional includes
along these lines.

Andrew



Re: Problems with Mutex initialization

Posted by Manuel Teira <mt...@tid.es>.
Alan Conway escribió:
> On Wed, 2008-06-25 at 11:12 +0200, Manuel Teira wrote:
>   
>> Hello. After a pair of weeks dealing with random, uninteresting stuff,
>> I'm back to the solaris port effort. I would like to wake up this issue
>> again, since it's a stopper for the solaris testing I'm trying to do.
>>
>> After all this discussion, what seems to be true, is:
>> 1.-The static initialization order is not the desired one.
>> 2.-Inlining the Mutex.h annonymous code in every Mutex usage, produces
>> multiple instances of onceControl and mutexattr, whereas the intended
>> behaviour was to have single instances, to be shared across the code.
>> 3.-About the PODMutex discussion, it is not needed here, since the
>> pthread_once_t usage is giving us the desired "one-shot" behaviour.
>>
>> It was also said, that fixing #2 could lead to get #1 fixed. Anyway,
>> what is the recommended technique to "force" static initialization if
>> having #2 fixed is not helping?
>>
>>     
>
> Just committed a fix that removes the need for a global object with a
> constructor so should fix any init order problems. Also took this code
> out of line so it's genuinely once, not once-per-compilation-unit. Shout
> if it doesn't fix the problem or creates any new ones.
>   
Great, it got fixed!!

Most of the tests are passing now, but running the federation tests the 
old dynamic_cast monster related with the Sun compiler bug arises again. 
The problem was due to AsynchConnector being implemented in terms of 
DispatchHandle, so, I had to change that inheritance to be public also.

I would like to show you how I've handled this workaround, to see if 
this is an acceptable way to do it:

class AsynchConnector :
//Bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6710638
//Should be private inheritance
#ifdef __SUNPRO_CC
    public
#else
    private
#endif
    DispatchHandle {
public:
...

The same for AsynchIO. This way, we will only use that inheritance when 
using the Sun Studio compiler, at least while that bug doesn't get 
fixed.  Perhaps we should vote for it to get fixed. ;-)

Also, I would like to know what is the state of the JIRA issues I 
submitted for the solaris port (QPID-1132 and QPID-1133), I mean, did 
you have time to look at them, are there any problem with them,...?

I have another bunch of changes, most related with inclusion 
differences. Perhaps they could be added without wrapping them in a 
#ifdef clause, but I'm not sure. My plan is to open a new JIRA for those 
changes, and expose them to your consideration. What do you think? (I'm 
talking about this kind of modifications) :

-bash-3.00$ svn diff ../../src/qpid/sys/posix/AsynchIO.cpp
Index: ../../src/qpid/sys/posix/AsynchIO.cpp
===================================================================
--- ../../src/qpid/sys/posix/AsynchIO.cpp       (revision 671815)
+++ ../../src/qpid/sys/posix/AsynchIO.cpp       (working copy)
@@ -19,6 +19,10 @@
  *
  */
 
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
 #include "qpid/sys/AsynchIO.h"
 #include "qpid/sys/Time.h"
 
@@ -31,6 +35,11 @@
 #include <signal.h>
 #include <errno.h>
 
+#ifdef SUNOS
+//strerror, memmove prototypes
+#include <string.h>
+#endif
+
 #include <boost/bind.hpp>
 
 using namespace qpid::sys;


Do you agree with this way to handle it, or should we just include 
string.h directly? Perhaps somebody should test all these changes on a 
linux box, before deciding to add them with or without the #ifdef wrapping.

Best regards.
--
Manuel.


> Cheers,
> Alan.
>
>
>   


Re: Problems with Mutex initialization

Posted by Alan Conway <ac...@redhat.com>.
On Wed, 2008-06-25 at 11:12 +0200, Manuel Teira wrote:
> Hello. After a pair of weeks dealing with random, uninteresting stuff, 
> I'm back to the solaris port effort. I would like to wake up this issue 
> again, since it's a stopper for the solaris testing I'm trying to do.
> 
> After all this discussion, what seems to be true, is:
> 1.-The static initialization order is not the desired one.
> 2.-Inlining the Mutex.h annonymous code in every Mutex usage, produces 
> multiple instances of onceControl and mutexattr, whereas the intended 
> behaviour was to have single instances, to be shared across the code.
> 3.-About the PODMutex discussion, it is not needed here, since the 
> pthread_once_t usage is giving us the desired "one-shot" behaviour.
> 
> It was also said, that fixing #2 could lead to get #1 fixed. Anyway, 
> what is the recommended technique to "force" static initialization if 
> having #2 fixed is not helping?
> 

Just committed a fix that removes the need for a global object with a
constructor so should fix any init order problems. Also took this code
out of line so it's genuinely once, not once-per-compilation-unit. Shout
if it doesn't fix the problem or creates any new ones.

Cheers,
Alan.


Re: Problems with Mutex initialization

Posted by Manuel Teira <mt...@tid.es>.
Hello. After a pair of weeks dealing with random, uninteresting stuff, 
I'm back to the solaris port effort. I would like to wake up this issue 
again, since it's a stopper for the solaris testing I'm trying to do.

After all this discussion, what seems to be true, is:
1.-The static initialization order is not the desired one.
2.-Inlining the Mutex.h annonymous code in every Mutex usage, produces 
multiple instances of onceControl and mutexattr, whereas the intended 
behaviour was to have single instances, to be shared across the code.
3.-About the PODMutex discussion, it is not needed here, since the 
pthread_once_t usage is giving us the desired "one-shot" behaviour.

It was also said, that fixing #2 could lead to get #1 fixed. Anyway, 
what is the recommended technique to "force" static initialization if 
having #2 fixed is not helping?

Best regards.
--
Manuel.



Alan Conway escribió:
> Andrew Stitcher wrote:
>   
>> On Thu, 2008-06-05 at 14:39 -0400, Alan Conway wrote:
>>     
>>> Andrew Stitcher wrote:
>>>       
>>>> On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
>>>>         
>>>>> ...
>>>>> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
>>>>> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
>>>>> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t
>>>>> at 0x0000000100400758
>>>>> Error initializing mutex: 0xffffffff7e14e680. Error 0
>>>>> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
>>>>>
>>>>> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
>>>>> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
>>>>>
>>>>>
>>>>> I suspect that the static nature of AllThreadStatuses could be involved.
>>>>> Any thought?
>>>>>           
>>>> I'm pretty sure you are correct about the static initialisation order.
>>>>
>>>> One problem that your trace points out though is that we will get a
>>>> _different_ initialisation for each file including Mutex.h due to the
>>>> anonymous namespace used here - this wasn't the intention of this code.
>>>> I guess this needs to be fixed by creating a Mutex.cpp to hold this
>>>> anonymous namespace. However I'm not sure whether or not this is the
>>>> cause of your problem,although it can't help.
>>>>
>>>>         
>>> I think I may see the problem. Is all AllThreadsStatuses called (directly or
>>> indirectly) from static constructors in other compilation units?
>>>
>>> An object that is called from static constructors in other compilation units has
>>> to be written in a rather special way - it has to be a POD and be POD
>>> initialized - i.e. no constructors or destructors and no members/bases with
>>> ctors/dtors.
>>>       
>> It's the mutexattr static constructor that is the problem - that needs
>> to be written as a singleton and as I said before moved into an
>> implementation file.
>>
>>     
>>> We have a PODMutex for exactly that situation
>>>       
>> PODMutex is currently useless (and isn't used anywhere as far as I know)
>> because it can't create a recursive mutex which is what Mutex is.
>>
>>     
> Its not useless, and it doesn't need to be recursive. It is useful precisely for
> protecting the critical section of a singleton which may run in a static
> construtor. It should be used only for static mutex memebers or global mutex
> variables - e.g. if we need to put mutexatter in a singleton, we can protect the
> singleton with a PODMutex.
>
>   
>> A piece of work we still need to do is to create a RecursiveMutex and
>> revert Mutex to non recursive then only make mutexes recursive that have
>> to be - then PODMutex might be useful again.
>>
>>     
> POD mutex is not recursive and never will be. It's intended only for the
> restricted situation of a static initialized mutex.
>
>   
>>> Andrew do you know the circumstances under which this is called? Manuel - can
>>> you post a stack trace? That should answer the question.
>>>       
>> I think that fixing Mutex so that its contructor can be safely called
>> from anywhere in static initialisation is the way to go here. I think
>> that the other issues are consequences of this lack. The problem is
>> exacerbated because there are multiple initialisations of the
>> mutexInitAttr one for every implementation file which uses Mutex so it
>> almost bound to fail in this way - it's amazing we've not seen it
>> before.
>>     
>
> I agree - and PODMutex is the tool we can use to acomplish that. I can give this
> a shot but probably not today, I'll put it on my list.
>
> An even better solution would be to drop use of recursive mutexes, which are
> evil, but I don't know how big a job that is. Most of the code is written with
> the intent that it not require a recursive mutex but there may be some places
> that will deadlock.
>
>
> Cheers,
> Alan.
> .
>
>   


Re: Problems with Mutex initialization

Posted by Andrew Stitcher <as...@redhat.com>.
On Fri, 2008-06-06 at 09:12 -0400, Alan Conway wrote:
> ...
> Its not useless, and it doesn't need to be recursive. It is useful precisely for 
> protecting the critical section of a singleton which may run in a static 
> construtor. It should be used only for static mutex memebers or global mutex 
> variables - e.g. if we need to put mutexatter in a singleton, we can protect the 
> singleton with a PODMutex.

Ah I understand its intent now - this isn't needed in this case because
we use pthread_once to achieve the same end ie.it gives exclusion and
ensures that the initialisation is only called once.

> I agree - and PODMutex is the tool we can use to acomplish that. I can give this 
> a shot but probably not today, I'll put it on my list.

I don't think using any kind of mutex can help change the static
initialisation order! We don't need any further mutual exclusion of the
initialisation as above.

> 
> An even better solution would be to drop use of recursive mutexes, which are 
> evil, but I don't know how big a job that is. Most of the code is written with 
> the intent that it not require a recursive mutex but there may be some places 
> that will deadlock.

+1

As an interim measure only using recursive mutexes where necessary will
help to see what we have to rework.

Andrew



Re: Problems with Mutex initialization

Posted by Alan Conway <ac...@redhat.com>.
Andrew Stitcher wrote:
> On Thu, 2008-06-05 at 14:39 -0400, Alan Conway wrote:
>> Andrew Stitcher wrote:
>>> On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
>>>> ...
>>>> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
>>>> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
>>>> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t 
>>>> at 0x0000000100400758
>>>> Error initializing mutex: 0xffffffff7e14e680. Error 0
>>>> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
>>>>
>>>> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
>>>> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
>>>>
>>>>
>>>> I suspect that the static nature of AllThreadStatuses could be involved. 
>>>> Any thought?
>>> I'm pretty sure you are correct about the static initialisation order.
>>>
>>> One problem that your trace points out though is that we will get a
>>> _different_ initialisation for each file including Mutex.h due to the
>>> anonymous namespace used here - this wasn't the intention of this code.
>>> I guess this needs to be fixed by creating a Mutex.cpp to hold this
>>> anonymous namespace. However I'm not sure whether or not this is the
>>> cause of your problem,although it can't help.
>>>
>> I think I may see the problem. Is all AllThreadsStatuses called (directly or 
>> indirectly) from static constructors in other compilation units?
>>
>> An object that is called from static constructors in other compilation units has 
>> to be written in a rather special way - it has to be a POD and be POD 
>> initialized - i.e. no constructors or destructors and no members/bases with 
>> ctors/dtors.
> 
> It's the mutexattr static constructor that is the problem - that needs
> to be written as a singleton and as I said before moved into an
> implementation file.
> 
>> We have a PODMutex for exactly that situation
> 
> PODMutex is currently useless (and isn't used anywhere as far as I know)
> because it can't create a recursive mutex which is what Mutex is.
> 
Its not useless, and it doesn't need to be recursive. It is useful precisely for 
protecting the critical section of a singleton which may run in a static 
construtor. It should be used only for static mutex memebers or global mutex 
variables - e.g. if we need to put mutexatter in a singleton, we can protect the 
singleton with a PODMutex.

> A piece of work we still need to do is to create a RecursiveMutex and
> revert Mutex to non recursive then only make mutexes recursive that have
> to be - then PODMutex might be useful again.
> 
POD mutex is not recursive and never will be. It's intended only for the 
restricted situation of a static initialized mutex.

>> Andrew do you know the circumstances under which this is called? Manuel - can 
>> you post a stack trace? That should answer the question.
> 
> I think that fixing Mutex so that its contructor can be safely called
> from anywhere in static initialisation is the way to go here. I think
> that the other issues are consequences of this lack. The problem is
> exacerbated because there are multiple initialisations of the
> mutexInitAttr one for every implementation file which uses Mutex so it
> almost bound to fail in this way - it's amazing we've not seen it
> before.

I agree - and PODMutex is the tool we can use to acomplish that. I can give this 
a shot but probably not today, I'll put it on my list.

An even better solution would be to drop use of recursive mutexes, which are 
evil, but I don't know how big a job that is. Most of the code is written with 
the intent that it not require a recursive mutex but there may be some places 
that will deadlock.


Cheers,
Alan.

Re: Problems with Mutex initialization

Posted by Manuel Teira <mt...@tid.es>.
Andrew Stitcher escribió:
> On Thu, 2008-06-05 at 14:39 -0400, Alan Conway wrote:
>   
>> Andrew Stitcher wrote:
>>     
>>> On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
>>>       
>>>> ...
>>>> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
>>>> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
>>>> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t
>>>> at 0x0000000100400758
>>>> Error initializing mutex: 0xffffffff7e14e680. Error 0
>>>> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
>>>>
>>>> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
>>>> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
>>>>
>>>>
>>>> I suspect that the static nature of AllThreadStatuses could be involved.
>>>> Any thought?
>>>>         
>>> I'm pretty sure you are correct about the static initialisation order.
>>>
>>> One problem that your trace points out though is that we will get a
>>> _different_ initialisation for each file including Mutex.h due to the
>>> anonymous namespace used here - this wasn't the intention of this code.
>>> I guess this needs to be fixed by creating a Mutex.cpp to hold this
>>> anonymous namespace. However I'm not sure whether or not this is the
>>> cause of your problem,although it can't help.
>>>
>>>       
>> I think I may see the problem. Is all AllThreadsStatuses called (directly or
>> indirectly) from static constructors in other compilation units?
>>
>> An object that is called from static constructors in other compilation units has
>> to be written in a rather special way - it has to be a POD and be POD
>> initialized - i.e. no constructors or destructors and no members/bases with
>> ctors/dtors.
>>     
>
> It's the mutexattr static constructor that is the problem - that needs
> to be written as a singleton and as I said before moved into an
> implementation file.
>
>   
>> We have a PODMutex for exactly that situation
>>     
>
> PODMutex is currently useless (and isn't used anywhere as far as I know)
> because it can't create a recursive mutex which is what Mutex is.
>
> A piece of work we still need to do is to create a RecursiveMutex and
> revert Mutex to non recursive then only make mutexes recursive that have
> to be - then PODMutex might be useful again.
>
>   
>> Andrew do you know the circumstances under which this is called? Manuel - can
>> you post a stack trace? That should answer the question.
>>     
>
> I think that fixing Mutex so that its contructor can be safely called
> from anywhere in static initialisation is the way to go here. I think
> that the other issues are consequences of this lack. The problem is
> exacerbated because there are multiple initialisations of the
> mutexInitAttr one for every implementation file which uses Mutex so it
> almost bound to fail in this way - it's amazing we've not seen it
> before.
>   
I think it was for the same reason that Mutex and RWLock were able to 
share the pthread_once_t control without error in Linux. Non initialized 
pthread_mutexattr_t are probably valid for the pthread_mutex_init call 
in the Linux version (perhaps a zero inizialized struct is enough, and 
the compiler is supplying that), but I think that the locks must be 
failing randomly at least in the initial stage of the process.

Best regards.
--
Manuel.

> Andrew
>
>
>   
>> Cheers,
>> Alan.
>>     
>
> .
>
>   


Re: Problems with Mutex initialization

Posted by Andrew Stitcher <as...@redhat.com>.
On Thu, 2008-06-05 at 14:39 -0400, Alan Conway wrote:
> Andrew Stitcher wrote:
> > On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
> >> ...
> >> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
> >> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
> >> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t 
> >> at 0x0000000100400758
> >> Error initializing mutex: 0xffffffff7e14e680. Error 0
> >> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
> >>
> >> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
> >> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
> >>
> >>
> >> I suspect that the static nature of AllThreadStatuses could be involved. 
> >> Any thought?
> > 
> > I'm pretty sure you are correct about the static initialisation order.
> > 
> > One problem that your trace points out though is that we will get a
> > _different_ initialisation for each file including Mutex.h due to the
> > anonymous namespace used here - this wasn't the intention of this code.
> > I guess this needs to be fixed by creating a Mutex.cpp to hold this
> > anonymous namespace. However I'm not sure whether or not this is the
> > cause of your problem,although it can't help.
> > 
> 
> I think I may see the problem. Is all AllThreadsStatuses called (directly or 
> indirectly) from static constructors in other compilation units?
> 
> An object that is called from static constructors in other compilation units has 
> to be written in a rather special way - it has to be a POD and be POD 
> initialized - i.e. no constructors or destructors and no members/bases with 
> ctors/dtors.

It's the mutexattr static constructor that is the problem - that needs
to be written as a singleton and as I said before moved into an
implementation file.

> 
> We have a PODMutex for exactly that situation

PODMutex is currently useless (and isn't used anywhere as far as I know)
because it can't create a recursive mutex which is what Mutex is.

A piece of work we still need to do is to create a RecursiveMutex and
revert Mutex to non recursive then only make mutexes recursive that have
to be - then PODMutex might be useful again.

> 
> Andrew do you know the circumstances under which this is called? Manuel - can 
> you post a stack trace? That should answer the question.

I think that fixing Mutex so that its contructor can be safely called
from anywhere in static initialisation is the way to go here. I think
that the other issues are consequences of this lack. The problem is
exacerbated because there are multiple initialisations of the
mutexInitAttr one for every implementation file which uses Mutex so it
almost bound to fail in this way - it's amazing we've not seen it
before.

Andrew


> 
> Cheers,
> Alan.


Re: Problems with Mutex initialization

Posted by Alan Conway <ac...@redhat.com>.
Andrew Stitcher wrote:
> On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
>> ...
>> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
>> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
>> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t 
>> at 0x0000000100400758
>> Error initializing mutex: 0xffffffff7e14e680. Error 0
>> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
>>
>> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
>> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
>>
>>
>> I suspect that the static nature of AllThreadStatuses could be involved. 
>> Any thought?
> 
> I'm pretty sure you are correct about the static initialisation order.
> 
> One problem that your trace points out though is that we will get a
> _different_ initialisation for each file including Mutex.h due to the
> anonymous namespace used here - this wasn't the intention of this code.
> I guess this needs to be fixed by creating a Mutex.cpp to hold this
> anonymous namespace. However I'm not sure whether or not this is the
> cause of your problem,although it can't help.
> 

I think I may see the problem. Is all AllThreadsStatuses called (directly or 
indirectly) from static constructors in other compilation units?

An object that is called from static constructors in other compilation units has 
to be written in a rather special way - it has to be a POD and be POD 
initialized - i.e. no constructors or destructors and no members/bases with 
ctors/dtors.

We have a PODMutex for exactly that situation but if AllThreadStatuses is indeed 
being called from other compilation units during static init then it needs a 
redesign - both the Mutex and std::vector members have constructors. It may need 
to be rewritten as a singleton.

Andrew do you know the circumstances under which this is called? Manuel - can 
you post a stack trace? That should answer the question.

Cheers,
Alan.

Re: Problems with Mutex initialization

Posted by Andrew Stitcher <as...@redhat.com>.
On Thu, 2008-06-05 at 19:31 +0200, Manuel Teira wrote:
> ...
> initMutexattr[1](once: 0xffffffff7e14e300, mutexattr: 0xffffffff7e1535f8)
> initMutexattr[1](once: 0xffffffff7e14e620, mutexattr: 0xffffffff7e1535d8)
> Mutex::Mutex with mutex at 0xffffffff7e14e680 and pthread_mutex_attr_t 
> at 0x0000000100400758
> Error initializing mutex: 0xffffffff7e14e680. Error 0
> AllThreadsStatuses ctor. Mutex: 0xffffffff7e14e680
> 
> [snip lots of other initMutexattr and Mutex::Mutex nonrelated calls]
> initMutexattr[1](once: 0x00000001003f7730, mutexattr: 0x0000000100400758)
> 
> 
> I suspect that the static nature of AllThreadStatuses could be involved. 
> Any thought?

I'm pretty sure you are correct about the static initialisation order.

One problem that your trace points out though is that we will get a
_different_ initialisation for each file including Mutex.h due to the
anonymous namespace used here - this wasn't the intention of this code.
I guess this needs to be fixed by creating a Mutex.cpp to hold this
anonymous namespace. However I'm not sure whether or not this is the
cause of your problem,although it can't help.

Andrew