You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Gordon Sim (JIRA)" <qp...@incubator.apache.org> on 2009/08/13 09:47:14 UTC

[jira] Created: (QPID-2048) Client can hang on close() if broker is simultaenously killed

Client can hang on close() if broker is simultaenously killed
-------------------------------------------------------------

                 Key: QPID-2048
                 URL: https://issues.apache.org/jira/browse/QPID-2048
             Project: Qpid
          Issue Type: Bug
          Components: C++ Client
    Affects Versions: 0.5
            Reporter: Gordon Sim
            Assignee: Gordon Sim
             Fix For: 0.6


There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.

E.g., from a run of ais_check with store loaded:

Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
#0  0x00975410 in __kernel_vsyscall ()
#1  0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x001b973e in qpid::client::StateManager::waitFor ()
#4  0x00166e52 in qpid::client::ConnectionHandler::close ()
#5  0x0016f2fb in qpid::client::ConnectionImpl::close ()
#6  0x0015dde4 in qpid::client::Connection::close ()
#7  0x0808d5da in ClusterFixture::killWithSilencer ()
#8  0x0806dab4 in testConnectionKnownHosts ()
#9  0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
#10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
#11 0x0079df35 in boost::execution_monitor::catch_signals ()
#12 0x0079e2c6 in boost::execution_monitor::execute ()
#13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
#14 0x007a1194 in boost::unit_test::framework_impl::visit ()
#15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
#16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
#17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
#18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
#19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
#20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
#21 0x007a0169 in boost::unit_test::framework::run ()
#22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
#23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
#24 0x080568b1 in _start ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Commented: (QPID-2048) Client can hang on close() if broker is simultaenously killed

Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742753#action_12742753 ] 

Gordon Sim commented on QPID-2048:
----------------------------------

I've checked in a fix that checks the state is still OPEN before setting it to CLOSING. If its not then the connection is assumed to have closed or failed concurrently. The threading and locking for this class  should be revisited as the design is not coherant on that point.

> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
>                 Key: QPID-2048
>                 URL: https://issues.apache.org/jira/browse/QPID-2048
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.5
>            Reporter: Gordon Sim
>            Assignee: Gordon Sim
>             Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0  0x00975410 in __kernel_vsyscall ()
> #1  0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2  0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3  0x001b973e in qpid::client::StateManager::waitFor ()
> #4  0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5  0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6  0x0015dde4 in qpid::client::Connection::close ()
> #7  0x0808d5da in ClusterFixture::killWithSilencer ()
> #8  0x0806dab4 in testConnectionKnownHosts ()
> #9  0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Resolved: (QPID-2048) Client can hang on close() if broker is simultaenously killed

Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gordon Sim resolved QPID-2048.
------------------------------

    Resolution: Fixed

> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
>                 Key: QPID-2048
>                 URL: https://issues.apache.org/jira/browse/QPID-2048
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.5
>            Reporter: Gordon Sim
>            Assignee: Alan Conway
>             Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0  0x00975410 in __kernel_vsyscall ()
> #1  0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2  0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3  0x001b973e in qpid::client::StateManager::waitFor ()
> #4  0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5  0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6  0x0015dde4 in qpid::client::Connection::close ()
> #7  0x0808d5da in ClusterFixture::killWithSilencer ()
> #8  0x0806dab4 in testConnectionKnownHosts ()
> #9  0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Updated: (QPID-2048) Client can hang on close() if broker is simultaenously killed

Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gordon Sim updated QPID-2048:
-----------------------------

    Status: Ready To Review  (was: In Progress)

> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
>                 Key: QPID-2048
>                 URL: https://issues.apache.org/jira/browse/QPID-2048
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.5
>            Reporter: Gordon Sim
>            Assignee: Gordon Sim
>             Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0  0x00975410 in __kernel_vsyscall ()
> #1  0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2  0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3  0x001b973e in qpid::client::StateManager::waitFor ()
> #4  0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5  0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6  0x0015dde4 in qpid::client::Connection::close ()
> #7  0x0808d5da in ClusterFixture::killWithSilencer ()
> #8  0x0806dab4 in testConnectionKnownHosts ()
> #9  0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] Assigned: (QPID-2048) Client can hang on close() if broker is simultaenously killed

Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gordon Sim reassigned QPID-2048:
--------------------------------

    Assignee: Alan Conway  (was: Gordon Sim)

> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
>                 Key: QPID-2048
>                 URL: https://issues.apache.org/jira/browse/QPID-2048
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.5
>            Reporter: Gordon Sim
>            Assignee: Alan Conway
>             Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0  0x00975410 in __kernel_vsyscall ()
> #1  0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2  0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3  0x001b973e in qpid::client::StateManager::waitFor ()
> #4  0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5  0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6  0x0015dde4 in qpid::client::Connection::close ()
> #7  0x0808d5da in ClusterFixture::killWithSilencer ()
> #8  0x0806dab4 in testConnectionKnownHosts ()
> #9  0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org