You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by cwan <ch...@cloudistics.com> on 2018/07/20 17:34:27 UTC

Open connection to broker waits forever

Greetings.

I have a problem with qpid clients stuck on the open connection call while
reconnecting with the broker.

Our system uses qpid-cpp 1.38 c++ client/broker and AMQP 0-10 on RHEL 7.5.  
Once in a while, the network connections between the clients and the broker
break, and when the clients reconnect, some of them are blocked because the
open connection call never returns.   

Based on the stack trace (see below), it looks like the qpid client is
waiting for a connection to open, but the connection is never established.  

Stack Trace:
Thread 1 (Thread 0x7f8979af28c0 (LWP 36968)):
#0  0x00007f8977c7f995 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f89785be613 in qpid::sys::Condition::wait(qpid::sys::Mutex&) ()
from /lib64/libqpidclient.so.2
#2  0x00007f89785ea683 in qpid::client::StateManager::waitFor(std::set<int,
std::less&lt;int>, std::allocator<int> >) () from /lib64/libqpidclient.so.2
#3  0x00007f89785c226f in qpid::client::ConnectionHandler::waitForOpen() ()
from /lib64/libqpidclient.so.2
#4  0x00007f89785c808a in qpid::client::ConnectionImpl::open() () from
/lib64/libqpidclient.so.2
#5  0x00007f89785bfb68 in
qpid::client::Connection::open(qpid::client::ConnectionSettings const&) ()
from /lib64/libqpidclient.so.2
#6  0x00007f89785c01ed in qpid::client::Connection::open(qpid::Url const&,
qpid::client::ConnectionSettings const&) () from /lib64/libqpidclient.so.2
#7  0x00007f89796a119f in
qpid::client::amqp0_10::ConnectionImpl::tryConnect() () from
/lib64/libqpidmessaging.so.2
#8  0x00007f89796a28f4 in
qpid::client::amqp0_10::ConnectionImpl::connect(qpid::sys::AbsTime const&)
() from /lib64/libqpidmessaging.so.2
#9  0x00007f89796a3c93 in qpid::client::amqp0_10::ConnectionImpl::open() ()
from /lib64/libqpidmessaging.so.2
#10 0x00007f89796c44e4 in qpid::messaging::Connection::open() () from
/lib64/libqpidmessaging.so.2

I have seen two scenarios that the open connection call is stuck:
* SSL forcehandshake never completes
* After epoll error like this:
2018-07-19 15:36:35 [System] error Caught exception in state: 1 with event:
4: No such file or directory
(/builddir/build/BUILD/qpid-cpp-1.38.0/src/qpid/sys/epoll/EpollPoller.cpp:357)
 2018-07-19 15:36:35 [Security] warning Connect failed: Connection refused                                                                   
 Failed to connect (reconnect disabled)

It is a rare event, sometimes takes weeks or months to happen.  But when it
occurs, we have to manually restart the client process in order to
re-establish the broker connection.

I am seeking guidance to address this problem.  
I have two ideas so far:
1.  Instead of calling qpid::client::StateManager::waitFor(std::set<int>
desired), call qpid::client::StateManager::waitFor(std::set<int> desired,
qpid::sys::Duration timeout).   If I understand it correctly, a timeout
would ensure the open connect call returns eventually.  But I am not sure if
this would break other functionalities
2. Build a monitor in my code, and after some time if the qpid open
connection call doesn't return, forcibly kill the connection threads and
reconnect again... (this seems like a less desirable option)

Btw, we don't use qpid client's auto-reconnect because some custom clean up
is required after a disconnect.  This is the setting used:
{\"transport\":\"ssl\",\"heartbeat\":10,\"reconnect\":false,\"tcp_nodelay\":true}
The software work flow is like this:
1. On disconnect, destroy the connection object
2. Create a new connection
3. Call Connection::open
4. Create session, sender and receiver

Regards,

Chen Wan



--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Re: [qpid-cpp client] Open connection to broker waits forever - patch attached

Posted by cwan <ch...@cloudistics.com>.
Hi Gordon,

You're right. My apologies for messing up the patch.
I have created https://issues.apache.org/jira/browse/QPID-8221, and attached
the correct patch to that issue.

Thanks,

Chen



--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Re: [qpid-cpp client] Open connection to broker waits forever - patch attached

Posted by Gordon Sim <gs...@redhat.com>.
On 24/07/18 15:27, cwan wrote:
> qpid_client_connect_timeout.patch
> <http://qpid.2158936.n2.nabble.com/file/t396381/qpid_client_connect_timeout.patch>
> 
> Attached is a patch to add connect timeout to qpid-cpp client's open
> connection call.
> A new option called "connect-timeout" can be used to specify how long to
> wait (in seconds) for the connection::open call.  When the "connect-timeout"
> is set, qpid-cpp client calls waitFor with a timeout.

Your patch seems to be reversed. The best thing is to open a JIRA for it 
and attach the patch to that. It looks ok to me though.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Re: [qpid-cpp client] Open connection to broker waits forever - patch attached

Posted by cwan <ch...@cloudistics.com>.
qpid_client_connect_timeout.patch
<http://qpid.2158936.n2.nabble.com/file/t396381/qpid_client_connect_timeout.patch>  

Attached is a patch to add connect timeout to qpid-cpp client's open
connection call.
A new option called "connect-timeout" can be used to specify how long to
wait (in seconds) for the connection::open call.  When the "connect-timeout"
is set, qpid-cpp client calls waitFor with a timeout.

I have tested it in my environment, and it seems to be able to time out
properly when the open call gets stucked.

The main part of patch is this:
diff --git a/src/qpid/client/ConnectionHandler.cpp
b/src/qpid/client/ConnectionHandler.cpp
index 4f044c2f3..77d43f191 100644
--- a/src/qpid/client/ConnectionHandler.cpp
+++ b/src/qpid/client/ConnectionHandler.cpp
@@ -148,16 +148,7 @@ void ConnectionHandler::outgoing(AMQFrame& frame)
 
 void ConnectionHandler::waitForOpen()
 {
-    if (ConnectionSettings::connectTimeout) {
-        if (!waitFor(ESTABLISHED,
qpid::sys::Duration(ConnectionSettings::connectTimeout *
qpid::sys::TIME_SEC))) {
-            errorText = "Connection open timed out";
-            QPID_LOG(warning, errorText);
-            setState(FAILED);
-        }
-    } else {
-        waitFor(ESTABLISHED);//ESTABLISHED = OPEN, CLOSED or FAILED
-    }
-




--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org