You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Alexandre Trufanow <al...@gmail.com> on 2016/04/08 11:58:42 UTC

[QPID] Deadlock in unit tests on solaris

Hi,

I am trying to run QPID on solaris using sun studio and have managed to get
the broker to compile with a few minor fixes. Unfortunately many unit tests
are blocking.

The issue is a deadlock when SessionFixture is created. On the main thread,
the thread is blocked on a DispatchHandler during the call to newSession

=>[5] qpid::sys::Mutex::lock(this = <value unavailable>) (optimized), at
0xfffffd7ffdb81a0e (line ~116) in "Mutex.h"
  [6] qpid::sys::ScopedLock<qpid::sys::Mutex>::ScopedLock(this = <value
unavailable>, l = CLASS) (optimized), at 0xfffffd7ffdb819df (line ~33) in
"Mutex.h"
  [7] qpid::sys::DispatchHandle::rewatchWrite(this = 0xb63558) (optimized),
at 0xfffffd7ffdbf4cc0 (line ~109) in "DispatchHandle.cpp"
  [8] qpid::sys::posix::AsynchIO::notifyPendingWrite(this = <value
unavailable>) (optimized), at 0xfffffd7ffdb62824 (line ~389) in
"AsynchIO.cpp"
  [9] qpid::client::TCPConnector::handle(this = 0xb60fe0, frame = CLASS)
(optimized), at 0xfffffd7ffdf6dc1d (line ~209) in "TCPConnector.cpp"
[... shortened output]
  [22] qpid::client::Connection::newSession(this = <value unavailable>,
name = CLASS, timeout = 0) (optimized), at 0xfffffd7ffdf05b15 (line ~141)
in "Connection.cpp"
  [23]
qpid::tests::SessionFixtureT<qpid::tests::LocalConnection,qpid::client::Session_0_10>::SessionFixtureT(this
= 0xfffffd7fffdfe3d0, opts = STRUCT) (optimized), at 0x5d95b5 (line ~141)
in "BrokerFixture.h"

The lock is also held by one of two Poller threads which is waiting on poll

=>[4] qpid::sys::PollerPrivate::EventStream::getEvent(this = 0xb60ee8,
targetTimeout = CLASS) (optimized), at 0xfffffd7ffdb875cf (line ~466) in
"PosixPoller.cpp"
  [5] qpid::sys::PollerPrivate::EventStream::next(this = 0xb60ee8, timeout
= CLASS) (optimized), at 0xfffffd7ffdb86127 (line ~354) in "PosixPoller.cpp"
  [6] qpid::sys::Poller::wait(this = 0xb467f0, timeout = CLASS)
(optimized), at 0xfffffd7ffdb847c6 (line ~729) in "PosixPoller.cpp"
  [7] qpid::sys::Poller::run(this = 0xb467f0) (optimized), at
0xfffffd7ffdb84540 (line ~690) in "PosixPoller.cpp"

I do not understand how the same lock can be held simultaneously on both
threads but the deadlock is due to the fact that the call to poll will
never wake. I have noticied a suspicious comment on the main thread which
may be linked to this behavior. In TCPConnector::handle, there is the
following comment before the blocking call to AsynchIO.

    /*
      NOTE: Moving the following line into this mutex block
            is a workaround for BZ 570168, in which the test
            testConcurrentSenders causes a hang about 1.5%
            of the time.  ( To see the hang much more frequently
            leave this line out of the mutex block, and put a
            small usleep just before it.)

            TODO mgoulish - fix the underlying cause and then
                            move this call back outside the mutex.
    */
    if (notifyWrite && !closed) aio->notifyPendingWrite();

Do you have any hints what the underlying issue could be ?
Thanks,

Alexandre Trufanow
www.murex.com

Re: [QPID] Deadlock in unit tests on solaris

Posted by Alexandre Trufanow <al...@gmail.com>.
What I describe as "minor" fixes are adding imports or tweaking #ifdefs so
that the posix branch is taken (defined __sun instead of __unix__). There
are no changes related to locking or threads, the pthreads API is used for
this in the same way as on linux. From what I can see in the documentation,
there are no differences in the API for the calls which are made on both
systems. In particular, the mutex should be recursive in the same way as on
linux.

I also created a dummy implementation of SystemInfo::getInterfaceAddresses
and SystemInfo::getInterfaceNames based on the posix implementation
(<ifaddrs.h> is not available on my platform). Do you think this could be
related to the issue? My analysis did not show any calls to this code in
the failing test test.

For reference, the platform I am testing on is SunStudio 12.4 on Solaris
10u10. I am also using stlport4.

Thanks for your help

On Fri, Apr 8, 2016 at 4:49 PM, Andrew Stitcher <as...@redhat.com>
wrote:

> On Fri, 2016-04-08 at 11:58 +0200, Alexandre Trufanow wrote:
> > Hi,
> >
> > I am trying to run QPID on solaris using sun studio and have managed
> > to get
> > the broker to compile with a few minor fixes. Unfortunately many unit
> > tests
> > are blocking.
>
> Could you tell us what the "minor fixes" are? I suspect they may be
> relevant to the issue.
>
> Also it may be relevant to give a few more details about your
> environment: Which Solaris version on what architecture? With exactly
> which version of the compiler? In this case I doubt that will give a
> whole lot more info, but in general it helps.
>
> One potential problem that comes to mind that could cause this sort of
> issue: Did you implement Mutex yourself? For some irritating reasons,
> Mutex must be a recursive mutex not a regular mutex, and making it a
> regular mutex will cause some parts of the code base to deadlock.
>
> It has been a long time since I tried to compile qpid on Solaris so
> that's about the limit of my memory.
>
> Solaris is a very lightly test platform unfortunately so you may be
> largely on your own.
>
> Good luck,
>
> Andrew
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: [QPID] Deadlock in unit tests on solaris

Posted by Andrew Stitcher <as...@redhat.com>.
On Fri, 2016-04-08 at 11:58 +0200, Alexandre Trufanow wrote:
> Hi,
> 
> I am trying to run QPID on solaris using sun studio and have managed
> to get
> the broker to compile with a few minor fixes. Unfortunately many unit
> tests
> are blocking.

Could you tell us what the "minor fixes" are? I suspect they may be
relevant to the issue.

Also it may be relevant to give a few more details about your
environment: Which Solaris version on what architecture? With exactly
which version of the compiler? In this case I doubt that will give a
whole lot more info, but in general it helps.

One potential problem that comes to mind that could cause this sort of
issue: Did you implement Mutex yourself? For some irritating reasons,
Mutex must be a recursive mutex not a regular mutex, and making it a
regular mutex will cause some parts of the code base to deadlock.

It has been a long time since I tried to compile qpid on Solaris so
that's about the limit of my memory.

Solaris is a very lightly test platform unfortunately so you may be
largely on your own.

Good luck,

Andrew


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org