You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Steve Huston <sh...@riverace.com> on 2012/11/15 04:05:28 UTC

Review Request: Fix for Thread object's operator bool()

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8072/
-----------------------------------------------------------

Review request for qpid, Andrew Stitcher, Chug Rolke, and Cliff Jansen.


Description
-------

The assert in QPID-4424 was a check for a Thread object not set. This change resolves that problem, but could it really be that easy? Why doesn't the Linux code fail the same way?


This addresses bug QPID-4424.
    https://issues.apache.org/jira/browse/QPID-4424


Diffs
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/Thread.cpp 1409628 

Diff: https://reviews.apache.org/r/8072/diff/


Testing
-------

Original reproducing test case in QPID-4424 (run broker quiet for 15 seconds). I set a breakpoint at the assert and stepped across it without error.


Thanks,

Steve Huston


Re: Review Request: Fix for Thread object's operator bool()

Posted by Andrew Stitcher <as...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8072/#review13460
-----------------------------------------------------------


On the face of it I also can see no reason why bool(impl) should return a different result to bool(impl.get != 0) so I'm a little suspicious that this is really fixing anything! I guess it depends on the precise behaviour of bool(boost::shared_ptr<T*>) but that should have precisely the same behaviour as extracting the T* and comparing with the null pointer.

- Andrew Stitcher


On Nov. 15, 2012, 3:05 a.m., Steve Huston wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8072/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2012, 3:05 a.m.)
> 
> 
> Review request for qpid, Andrew Stitcher, Chug Rolke, and Cliff Jansen.
> 
> 
> Description
> -------
> 
> The assert in QPID-4424 was a check for a Thread object not set. This change resolves that problem, but could it really be that easy? Why doesn't the Linux code fail the same way?
> 
> 
> This addresses bug QPID-4424.
>     https://issues.apache.org/jira/browse/QPID-4424
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/Thread.cpp 1409628 
> 
> Diff: https://reviews.apache.org/r/8072/diff/
> 
> 
> Testing
> -------
> 
> Original reproducing test case in QPID-4424 (run broker quiet for 15 seconds). I set a breakpoint at the assert and stepped across it without error.
> 
> 
> Thanks,
> 
> Steve Huston
> 
>


Re: Review Request: Fix for Thread object's operator bool()

Posted by Steve Huston <sh...@riverace.com>.

> On Nov. 15, 2012, 9:17 p.m., Cliff Jansen wrote:
> > I am going to disagree (with the proposed patch, and agree with Andrew).  I managed to reproduce (took closer to 15 minutes on my laptop) and get a similar stack trace.
> > 
> > I see a shared_ptr with non-null value and use_count of 3.  impl->get() would also return true and trigger the assert.
> > 
> > On a separate thread, I see the same PollableQueue instance in dispatch()/process(), waiting to reacquire the lock after line 152.
> > 
> > I think the explanation is more likely that differences in the poller implementations expose a different scheduling opportunity for the bug on Windows compared to Linux.
> 
> Cliff Jansen wrote:
>     Speculating further: on Linux, the PollableCondition has a pipe fd plugged into the poller, and the poller can't see a second event to dispatch until the pipe is re-enabled after the first callback completes.
>     
>     On Windows, perhaps there are two async poke's, allowing two dispatches to occur and enabling the window of opportunity.  That's just a wild guess, but I think the main clue is that there are two dispatches running on the same PollableQueue in separate threads.

Thank you for working on this, Cliff! Your analysis made me look at the PollableQueue code (awake this time :-) and it's easy to see how multiple threads can get into dispatch. The code assumes that once dispatch() is entered, no other thread will enter it and call process() until the first thread is done. I'll look further at how to resolve this in PollableQueue.


- Steve


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8072/#review13485
-----------------------------------------------------------


On Nov. 15, 2012, 3:05 a.m., Steve Huston wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8072/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2012, 3:05 a.m.)
> 
> 
> Review request for qpid, Andrew Stitcher, Chug Rolke, and Cliff Jansen.
> 
> 
> Description
> -------
> 
> The assert in QPID-4424 was a check for a Thread object not set. This change resolves that problem, but could it really be that easy? Why doesn't the Linux code fail the same way?
> 
> 
> This addresses bug QPID-4424.
>     https://issues.apache.org/jira/browse/QPID-4424
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/Thread.cpp 1409628 
> 
> Diff: https://reviews.apache.org/r/8072/diff/
> 
> 
> Testing
> -------
> 
> Original reproducing test case in QPID-4424 (run broker quiet for 15 seconds). I set a breakpoint at the assert and stepped across it without error.
> 
> 
> Thanks,
> 
> Steve Huston
> 
>


Re: Review Request: Fix for Thread object's operator bool()

Posted by Cliff Jansen <cl...@gmail.com>.

> On Nov. 15, 2012, 9:17 p.m., Cliff Jansen wrote:
> > I am going to disagree (with the proposed patch, and agree with Andrew).  I managed to reproduce (took closer to 15 minutes on my laptop) and get a similar stack trace.
> > 
> > I see a shared_ptr with non-null value and use_count of 3.  impl->get() would also return true and trigger the assert.
> > 
> > On a separate thread, I see the same PollableQueue instance in dispatch()/process(), waiting to reacquire the lock after line 152.
> > 
> > I think the explanation is more likely that differences in the poller implementations expose a different scheduling opportunity for the bug on Windows compared to Linux.

Speculating further: on Linux, the PollableCondition has a pipe fd plugged into the poller, and the poller can't see a second event to dispatch until the pipe is re-enabled after the first callback completes.

On Windows, perhaps there are two async poke's, allowing two dispatches to occur and enabling the window of opportunity.  That's just a wild guess, but I think the main clue is that there are two dispatches running on the same PollableQueue in separate threads.


- Cliff


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8072/#review13485
-----------------------------------------------------------


On Nov. 15, 2012, 3:05 a.m., Steve Huston wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8072/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2012, 3:05 a.m.)
> 
> 
> Review request for qpid, Andrew Stitcher, Chug Rolke, and Cliff Jansen.
> 
> 
> Description
> -------
> 
> The assert in QPID-4424 was a check for a Thread object not set. This change resolves that problem, but could it really be that easy? Why doesn't the Linux code fail the same way?
> 
> 
> This addresses bug QPID-4424.
>     https://issues.apache.org/jira/browse/QPID-4424
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/Thread.cpp 1409628 
> 
> Diff: https://reviews.apache.org/r/8072/diff/
> 
> 
> Testing
> -------
> 
> Original reproducing test case in QPID-4424 (run broker quiet for 15 seconds). I set a breakpoint at the assert and stepped across it without error.
> 
> 
> Thanks,
> 
> Steve Huston
> 
>


Re: Review Request: Fix for Thread object's operator bool()

Posted by Cliff Jansen <cl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8072/#review13485
-----------------------------------------------------------


I am going to disagree (with the proposed patch, and agree with Andrew).  I managed to reproduce (took closer to 15 minutes on my laptop) and get a similar stack trace.

I see a shared_ptr with non-null value and use_count of 3.  impl->get() would also return true and trigger the assert.

On a separate thread, I see the same PollableQueue instance in dispatch()/process(), waiting to reacquire the lock after line 152.

I think the explanation is more likely that differences in the poller implementations expose a different scheduling opportunity for the bug on Windows compared to Linux.

- Cliff Jansen


On Nov. 15, 2012, 3:05 a.m., Steve Huston wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8072/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2012, 3:05 a.m.)
> 
> 
> Review request for qpid, Andrew Stitcher, Chug Rolke, and Cliff Jansen.
> 
> 
> Description
> -------
> 
> The assert in QPID-4424 was a check for a Thread object not set. This change resolves that problem, but could it really be that easy? Why doesn't the Linux code fail the same way?
> 
> 
> This addresses bug QPID-4424.
>     https://issues.apache.org/jira/browse/QPID-4424
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/Thread.cpp 1409628 
> 
> Diff: https://reviews.apache.org/r/8072/diff/
> 
> 
> Testing
> -------
> 
> Original reproducing test case in QPID-4424 (run broker quiet for 15 seconds). I set a breakpoint at the assert and stepped across it without error.
> 
> 
> Thanks,
> 
> Steve Huston
> 
>


Re: Review Request: Fix for Thread object's operator bool()

Posted by Andrew Stitcher <as...@apache.org>.

> On Nov. 15, 2012, 6:30 p.m., Chug Rolke wrote:
> > Both instances of code kick off a function call into boost::shared_ptr<qpid::sys::ThreadPrivate>. Before the patch the function is ::operator <type>::* and after the patch the function is ::get, unsurprisingly. The results are then processed slightly differently before returning the result. Gcc produces a function call and post-call processing nearly identical to windows. So why do gcc versions work? Probably something to do with boost and ThreadPrivate on the two platforms.

Thanks to chuck for looking into this further. I still don't understand why it works with gcc/pthreads but not with VC/boost::threads however if it works now it'll remain a mystery.

I do suggest that putting a comment to say that we don't understand why this fixes the problem might be sensible though just in case someone is minded to change it back as it shouldn't change the behaviour! 


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8072/#review13479
-----------------------------------------------------------


On Nov. 15, 2012, 3:05 a.m., Steve Huston wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8072/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2012, 3:05 a.m.)
> 
> 
> Review request for qpid, Andrew Stitcher, Chug Rolke, and Cliff Jansen.
> 
> 
> Description
> -------
> 
> The assert in QPID-4424 was a check for a Thread object not set. This change resolves that problem, but could it really be that easy? Why doesn't the Linux code fail the same way?
> 
> 
> This addresses bug QPID-4424.
>     https://issues.apache.org/jira/browse/QPID-4424
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/Thread.cpp 1409628 
> 
> Diff: https://reviews.apache.org/r/8072/diff/
> 
> 
> Testing
> -------
> 
> Original reproducing test case in QPID-4424 (run broker quiet for 15 seconds). I set a breakpoint at the assert and stepped across it without error.
> 
> 
> Thanks,
> 
> Steve Huston
> 
>


Re: Review Request: Fix for Thread object's operator bool()

Posted by Chug Rolke <cr...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8072/#review13479
-----------------------------------------------------------

Ship it!


Both instances of code kick off a function call into boost::shared_ptr<qpid::sys::ThreadPrivate>. Before the patch the function is ::operator <type>::* and after the patch the function is ::get, unsurprisingly. The results are then processed slightly differently before returning the result. Gcc produces a function call and post-call processing nearly identical to windows. So why do gcc versions work? Probably something to do with boost and ThreadPrivate on the two platforms.

- Chug Rolke


On Nov. 15, 2012, 3:05 a.m., Steve Huston wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8072/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2012, 3:05 a.m.)
> 
> 
> Review request for qpid, Andrew Stitcher, Chug Rolke, and Cliff Jansen.
> 
> 
> Description
> -------
> 
> The assert in QPID-4424 was a check for a Thread object not set. This change resolves that problem, but could it really be that easy? Why doesn't the Linux code fail the same way?
> 
> 
> This addresses bug QPID-4424.
>     https://issues.apache.org/jira/browse/QPID-4424
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/Thread.cpp 1409628 
> 
> Diff: https://reviews.apache.org/r/8072/diff/
> 
> 
> Testing
> -------
> 
> Original reproducing test case in QPID-4424 (run broker quiet for 15 seconds). I set a breakpoint at the assert and stepped across it without error.
> 
> 
> Thanks,
> 
> Steve Huston
> 
>