You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Gordon Sim (JIRA)" <qp...@incubator.apache.org> on 2009/08/13 09:47:14 UTC
[jira] Created: (QPID-2048) Client can hang on close() if broker is
simultaenously killed
Client can hang on close() if broker is simultaenously killed
-------------------------------------------------------------
Key: QPID-2048
URL: https://issues.apache.org/jira/browse/QPID-2048
Project: Qpid
Issue Type: Bug
Components: C++ Client
Affects Versions: 0.5
Reporter: Gordon Sim
Assignee: Gordon Sim
Fix For: 0.6
There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
E.g., from a run of ais_check with store loaded:
Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
#0 0x00975410 in __kernel_vsyscall ()
#1 0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3 0x001b973e in qpid::client::StateManager::waitFor ()
#4 0x00166e52 in qpid::client::ConnectionHandler::close ()
#5 0x0016f2fb in qpid::client::ConnectionImpl::close ()
#6 0x0015dde4 in qpid::client::Connection::close ()
#7 0x0808d5da in ClusterFixture::killWithSilencer ()
#8 0x0806dab4 in testConnectionKnownHosts ()
#9 0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
#10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
#11 0x0079df35 in boost::execution_monitor::catch_signals ()
#12 0x0079e2c6 in boost::execution_monitor::execute ()
#13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
#14 0x007a1194 in boost::unit_test::framework_impl::visit ()
#15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
#16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
#17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
#18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
#19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
#20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
#21 0x007a0169 in boost::unit_test::framework::run ()
#22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
#23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
#24 0x080568b1 in _start ()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org
[jira] Commented: (QPID-2048) Client can hang on close() if broker
is simultaenously killed
Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
[ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742753#action_12742753 ]
Gordon Sim commented on QPID-2048:
----------------------------------
I've checked in a fix that checks the state is still OPEN before setting it to CLOSING. If its not then the connection is assumed to have closed or failed concurrently. The threading and locking for this class should be revisited as the design is not coherant on that point.
> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
> Key: QPID-2048
> URL: https://issues.apache.org/jira/browse/QPID-2048
> Project: Qpid
> Issue Type: Bug
> Components: C++ Client
> Affects Versions: 0.5
> Reporter: Gordon Sim
> Assignee: Gordon Sim
> Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0 0x00975410 in __kernel_vsyscall ()
> #1 0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2 0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3 0x001b973e in qpid::client::StateManager::waitFor ()
> #4 0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5 0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6 0x0015dde4 in qpid::client::Connection::close ()
> #7 0x0808d5da in ClusterFixture::killWithSilencer ()
> #8 0x0806dab4 in testConnectionKnownHosts ()
> #9 0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org
[jira] Resolved: (QPID-2048) Client can hang on close() if broker
is simultaenously killed
Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
[ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gordon Sim resolved QPID-2048.
------------------------------
Resolution: Fixed
> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
> Key: QPID-2048
> URL: https://issues.apache.org/jira/browse/QPID-2048
> Project: Qpid
> Issue Type: Bug
> Components: C++ Client
> Affects Versions: 0.5
> Reporter: Gordon Sim
> Assignee: Alan Conway
> Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0 0x00975410 in __kernel_vsyscall ()
> #1 0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2 0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3 0x001b973e in qpid::client::StateManager::waitFor ()
> #4 0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5 0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6 0x0015dde4 in qpid::client::Connection::close ()
> #7 0x0808d5da in ClusterFixture::killWithSilencer ()
> #8 0x0806dab4 in testConnectionKnownHosts ()
> #9 0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org
[jira] Updated: (QPID-2048) Client can hang on close() if broker is
simultaenously killed
Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
[ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gordon Sim updated QPID-2048:
-----------------------------
Status: Ready To Review (was: In Progress)
> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
> Key: QPID-2048
> URL: https://issues.apache.org/jira/browse/QPID-2048
> Project: Qpid
> Issue Type: Bug
> Components: C++ Client
> Affects Versions: 0.5
> Reporter: Gordon Sim
> Assignee: Gordon Sim
> Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0 0x00975410 in __kernel_vsyscall ()
> #1 0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2 0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3 0x001b973e in qpid::client::StateManager::waitFor ()
> #4 0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5 0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6 0x0015dde4 in qpid::client::Connection::close ()
> #7 0x0808d5da in ClusterFixture::killWithSilencer ()
> #8 0x0806dab4 in testConnectionKnownHosts ()
> #9 0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org
[jira] Assigned: (QPID-2048) Client can hang on close() if broker
is simultaenously killed
Posted by "Gordon Sim (JIRA)" <qp...@incubator.apache.org>.
[ https://issues.apache.org/jira/browse/QPID-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gordon Sim reassigned QPID-2048:
--------------------------------
Assignee: Alan Conway (was: Gordon Sim)
> Client can hang on close() if broker is simultaenously killed
> -------------------------------------------------------------
>
> Key: QPID-2048
> URL: https://issues.apache.org/jira/browse/QPID-2048
> Project: Qpid
> Issue Type: Bug
> Components: C++ Client
> Affects Versions: 0.5
> Reporter: Gordon Sim
> Assignee: Alan Conway
> Fix For: 0.6
>
>
> There is a race between ConnectionHandler::close() and ConnectionHandler::failed(). If the closing thread is 'between' checking for OPEN state and setting to CLOSING state (lines 149 and 150 as of r803787) when the failing thread sets the state to FAILED (line 181 as of r803787), then the FAILED state will be overwritten and the closing thread will hang.
> E.g., from a run of ais_check with store loaded:
> Thread 1 (Thread 0xb7fb3720 (LWP 11644)):
> #0 0x00975410 in __kernel_vsyscall ()
> #1 0x0032d595 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> #2 0x00a53b3d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
> #3 0x001b973e in qpid::client::StateManager::waitFor ()
> #4 0x00166e52 in qpid::client::ConnectionHandler::close ()
> #5 0x0016f2fb in qpid::client::ConnectionImpl::close ()
> #6 0x0015dde4 in qpid::client::Connection::close ()
> #7 0x0808d5da in ClusterFixture::killWithSilencer ()
> #8 0x0806dab4 in testConnectionKnownHosts ()
> #9 0x0807cacc in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke ()
> #10 0x007ad48d in ?? () from /usr/lib/libboost_unit_test_framework.so.2
> #11 0x0079df35 in boost::execution_monitor::catch_signals ()
> #12 0x0079e2c6 in boost::execution_monitor::execute ()
> #13 0x007ad599 in boost::unit_test::unit_test_monitor_t::execute_and_translate
> #14 0x007a1194 in boost::unit_test::framework_impl::visit ()
> #15 0x007b3ef7 in boost::unit_test::traverse_test_tree ()
> #16 0x007b46a0 in boost::unit_test::traverse_test_tree ()
> #17 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #18 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #19 0x007b44d8 in boost::unit_test::traverse_test_tree ()
> #20 0x007b46d5 in boost::unit_test::traverse_test_tree ()
> #21 0x007a0169 in boost::unit_test::framework::run ()
> #22 0x007ad249 in main () from /usr/lib/libboost_unit_test_framework.so.2
> #23 0x0098be8c in __libc_start_main () from /lib/libc.so.6
> #24 0x080568b1 in _start ()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org