You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Toralf Lund <to...@pgs.com> on 2019/12/18 10:42:15 UTC
Re: [External] Re: "Session detached by peer"; extra connections on
broker, lock-up on send
On 18/12/2019 10:40, Gordon Sim wrote:
> On 18/12/2019 8:45 am, Toralf Lund wrote:
>> On one of our installations of one of our AMQP 0-10 applications
>> using the QPid messaging API (in C++), we keep getting exceptions
>> with the message
>>
>> session-busy: Session detached by peer
>>
>> - usually while waiting for incoming messages, but in some cases also
>> when sending them. After resolving some issues with "dangling"
>> session pointers (with help from the list), I thought this wasn't
>> really a problem, but I just found that the broker has several extra
>> connections for the application; qpid-stat reports the following:
>>
>> $ qpid-stat -c
>> Connections
>> connection cproc cpid
>> mech auth connected idle msgIn msgOut
>> ===========================================================================================================================================
>>
>> qpid.192.168.0.215:5672-192.168.0.61:58532 our-program 134233
>> ANONYMOUS anonymous 4d 19h 42m 25s 4d 18h 37m 44s 47.3k 531k
>> qpid.192.168.0.215:5672-192.168.0.61:58564 our-program 134233
>> ANONYMOUS anonymous 4d 5h 27m 9s 4d 4h 32m 4s 42.3k 488k
>> qpid.192.168.0.215:5672-192.168.0.61:58566 our-program 134233
>> ANONYMOUS anonymous 4d 4h 32m 12s 4d 3h 33m 34s 44.6k 513k
>> qpid.192.168.0.215:5672-192.168.0.61:58580 our-program 134233
>> ANONYMOUS anonymous 1d 17h 37m 49s 1d 16h 41m 54s 40.6k 515k
>> qpid.192.168.0.215:5672-192.168.0.61:58582 our-program 134233
>> ANONYMOUS anonymous 1d 16h 42m 1s 1d 15h 46m 44s 41.3k 497k
>> qpid.192.168.0.215:5672-192.168.0.61:58584 our-program 134233
>> ANONYMOUS anonymous 1d 15h 46m 46s 1d 14h 43m 24s 46.8k 535k
>> qpid.192.168.0.215:5672-192.168.0.61:58618 our-program 134233
>> ANONYMOUS anonymous 22h 55m 50s 21h 48m 4s 44.5k 496k
>> qpid.192.168.0.215:5672-192.168.0.61:58846 our-program 134233
>> ANONYMOUS anonymous 18h 21m 22s 0s 0 0
>>
>> (Names and addresses have been edited.)
>>
>> The implied times seem to match instances of the session-busy
>> exception, but a new connection doesn't appear every time. The
>> exception has occurred 27 times during the last 4 days, and as you
>> can see, there are a not that many connections.
>
> It is possible some connections are actually closed. The session-busy
> can be a race, e.g. if the client times out a connection before the
> server (but the original connection and session is then eventually
> closed, just too late for the reattach attempt to succeed with the
> same session name).
OK.
>
> Your heartbeat interval is set quite low. I don't know if that was
> deliberately so, but a slightly less sensitive setting might avoid
> timing out in some cases.
I must admit I don't remember why the interval was set the way it was.
We're using 5 or 10 in some other cases. I considered trying a slightly
larger value, but got a bit worried that it would just mask the problem
rather than really solving it.
>
> That said you still clearly also still have multiple connections open,
> which I'm assuming is not expected? I.e. you would expect only one
> connection from that process?
Yes, exactly.
>
>> I don't see any signs of connection errors or explicit reopens but
>> automatic reconnect with a limit has been enabled; the following
>> connect options are used
>>
>> connection.setOption("reconnect", true);
>> connection.setOption("reconnect_limit", 0);
>>
>> (I believe this means trying reconnect once, as "reconnect_limit"
>> specifies retries after an initial attempt.)
>
> I believe a reconnect_limit of 0 will reconnect without any limit. If
> you want to only reconnect once, you would specify a value of 1.
> However that would I believe be reset every time it does successfully
> reconnect.
I've assumed the following constructs in
ConnectionImpl::ConnectionImpl() mean that a value of 1 actually gives 2
attempts:
for (double i = minReconnectInterval; !tryConnect(); i =
std::min(i*2, maxReconnectInterval)) {
[ ... ]
if (limit >= 0 && retries++ >= limit) {
throw qpid::messaging::TransportFailure("Failed to connect
within reconnect limit");
}
[ ... ]
}
QPID_LOG(debug, "Connection successful, urls=" << asString(urls));
retries = 0;
(Notice how there is one tryConnect() before anything is checked, and
that limit check has >= and *post" increment of "retries".) But perhaps
I read that the wrong way?
>
>>
>> An additional, more serious issue is that the system has also locked
>> up a couple of times following an exception during Sender::send().
>> Slightly simplified stack trace from one of the cases:
>>
>> Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
>> #0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>> #1 0x00007fb93f9c7677 in
>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>> from /usr/lib64/libqpidcommon.so.2
>> #2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>> from /usr/lib64/libqpidcommon.so.2
>> #3 0x00007fb93f9b8e4a in ?? ()
>> from /usr/lib64/libqpidcommon.so.2
>> #4 0x00007fb941663dd5 in start_thread () from
>> /usr/lib64/libpthread.so.0
>> #5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>
>> Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
>> #0 0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> from /usr/lib64/libpthread.so.0
>> #1 0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
>> from /usr/lib64/libqpidcommon.so.2
>> #2 0x00007fb93f9b8e4a in ?? ()
>> from /usr/lib64/libqpidcommon.so.2
>> #3 0x00007fb941663dd5 in start_thread () from
>> /usr/lib64/libpthread.so.0
>> #4 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>
>> Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
>> [ ... ]
>
> Thread 3's trace didn't make it through. Do you still have that?
>> Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
>> #0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>> #1 0x00007fb93f9c7677 in
>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>> from /usr/lib64/libqpidcommon.so.2
>> #2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>> from /usr/lib64/libqpidcommon.so.2
>> #3 0x00007fb93f9b8e4a in ?? ()
>> from /usr/lib64/libqpidcommon.so.2
>> #4 0x00007fb941663dd5 in start_thread () from
>> /usr/lib64/libpthread.so.0
>> #5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>
>> Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
>> #0 0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
>> #1 0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
>> #2 0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
>> #3 0x00007fb94039add7 in _Unwind_Resume () from
>> /usr/lib64/libgcc_s.so.1
>> #4 0x00007fb93fd8b927 in
>> qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&,
>> qpid::framing::MethodContent const*) ()
>> from /usr/lib64/libqpidclient.so.2
>> #5 0x00007fb93fd8b97b in
>> qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) ()
>> from /usr/lib64/libqpidclient.so.2
>> #6 0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
>> from /usr/lib64/libqpidclient.so.2
>> #7 0x00007fb9422403f8 in ?? ()
>> from /usr/lib64/libqpidmessaging.so.2
>> #8 0x00007fb942241e6a in ?? ()
>> from /usr/lib64/libqpidmessaging.so.2
>> #9 0x00000000004f2dc9 in ?? ()
>
> It looks like the send is stuck trying to unwind after an exception
> (possibly on the checkOpen()), due to not being able to unlock the
> sendLock semaphore. Is something in thread 3 holding any locks?
It may be holding locks, but not anything related to QPid. Specifically,
it's probably waiting for a condition via pthread_cond_wait(), which
implies that it's also holding a mutex associated with the condition
variable.
>
>> The application became responsive again after running a debug session
>> to get the above info. There is a chance that it did another
>> Sender::send() on the same sender (without trying to reset the
>> session or whatever) following the exception.
>>
>> Any idea what's going on here? Is there anything I can do to avoid
>> the session-busy errors? Can I prevent extra connections from
>> appearing if it should occur after all? Also, is there a way to
>> prevent the apparent lock-ups after exception on send()? I think I've
>> assumed that it's "safe" to try other operations as they will just
>> fail directly, unless the session is actually able to recover, but
>> perhaps that's a bad idea?
>>
>> Note that besides what's been mentioned already, the following
>> connection options are used:
>>
>> connection.setOption("tcp_nodelay", true);
>> connection.setOption("heartbeat", 2);
>>
>> Thanks,
>>
>> - Toralf
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org
Re: [External] Re: "Session detached by peer"; extra connections on
broker, lock-up on send
Posted by Toralf Lund <to...@pgs.com>.
On 18/12/2019 12:32, Gordon Sim wrote:
> On 18/12/2019 10:42 am, Toralf Lund wrote:
>> On 18/12/2019 10:40, Gordon Sim wrote:
>>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>>> I don't see any signs of connection errors or explicit reopens but
>>>> automatic reconnect with a limit has been enabled; the following
>>>> connect options are used
>>>>
>>>> connection.setOption("reconnect", true);
>>>> connection.setOption("reconnect_limit", 0);
>>>>
>>>> (I believe this means trying reconnect once, as "reconnect_limit"
>>>> specifies retries after an initial attempt.)
>>>
>>> I believe a reconnect_limit of 0 will reconnect without any limit.
>>> If you want to only reconnect once, you would specify a value of 1.
>>> However that would I believe be reset every time it does
>>> successfully reconnect.
>>
>> I've assumed the following constructs in
>> ConnectionImpl::ConnectionImpl() mean that a value of 1 actually
>> gives 2 attempts:
>>
>> for (double i = minReconnectInterval; !tryConnect(); i =
>> std::min(i*2, maxReconnectInterval)) {
>>
>> [ ... ]
>>
>> if (limit >= 0 && retries++ >= limit) {
>> throw qpid::messaging::TransportFailure("Failed to
>> connect within reconnect limit");
>> }
>>
>> [ ... ]
>>
>> }
>> QPID_LOG(debug, "Connection successful, urls=" << asString(urls));
>> retries = 0;
>>
>> (Notice how there is one tryConnect() before anything is checked, and
>> that limit check has >= and *post" increment of "retries".) But
>> perhaps I read that the wrong way?
>
> No, you are right. What level of logging do you have on the client
> side? Info level would have 'trying to connect' and 'connected to'
> messages.
I currently only have "error+" logging. It might be possible to restart
with a different level.
>
> The heartbeat firing logs only at debug level unfortunately from what
> I can see.
>
> Is there anything in your code that could be opening connections?
It actually has
if(!connection.isOpen()) {
Util::debug(1, "Open AMQP connection...");
try {
connection.open();
[ ... ]
in a function that's executed periodically, but I see only one instance
of the "Open AMQP connection..." message in my logs. (Util::debug() is
an in-house logging function, which is in this case expected to write
the given strings to a certain log file.)
I can't think of anything else that might open connections.
- Toralf
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org
Re: [External] Re: "Session detached by peer"; extra connections on
broker, lock-up on send
Posted by Gordon Sim <gs...@redhat.com>.
On 18/12/2019 10:42 am, Toralf Lund wrote:
> On 18/12/2019 10:40, Gordon Sim wrote:
>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>> I don't see any signs of connection errors or explicit reopens but
>>> automatic reconnect with a limit has been enabled; the following
>>> connect options are used
>>>
>>> connection.setOption("reconnect", true);
>>> connection.setOption("reconnect_limit", 0);
>>>
>>> (I believe this means trying reconnect once, as "reconnect_limit"
>>> specifies retries after an initial attempt.)
>>
>> I believe a reconnect_limit of 0 will reconnect without any limit. If
>> you want to only reconnect once, you would specify a value of 1.
>> However that would I believe be reset every time it does successfully
>> reconnect.
>
> I've assumed the following constructs in
> ConnectionImpl::ConnectionImpl() mean that a value of 1 actually gives 2
> attempts:
>
> for (double i = minReconnectInterval; !tryConnect(); i =
> std::min(i*2, maxReconnectInterval)) {
>
> [ ... ]
>
> if (limit >= 0 && retries++ >= limit) {
> throw qpid::messaging::TransportFailure("Failed to connect
> within reconnect limit");
> }
>
> [ ... ]
>
> }
> QPID_LOG(debug, "Connection successful, urls=" << asString(urls));
> retries = 0;
>
> (Notice how there is one tryConnect() before anything is checked, and
> that limit check has >= and *post" increment of "retries".) But perhaps
> I read that the wrong way?
No, you are right. What level of logging do you have on the client side?
Info level would have 'trying to connect' and 'connected to' messages.
The heartbeat firing logs only at debug level unfortunately from what I
can see.
Is there anything in your code that could be opening connections?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org
Re: [External] Re: "Session detached by peer"; extra connections on
broker, lock-up on send
Posted by Gordon Sim <gs...@redhat.com>.
On 18/12/2019 12:33 pm, Toralf Lund wrote:
> I get the exact behaviour I'm seeing now only in a version that was
> deployed about a week ago, and since then, between 60 and 70
> session-busy exceptions have been registered. Most of these came from
> Session::nextReceiver() or Receiver::fetch(), but exactly 2 were raised
> by Sender::send(). In both these cases, the application locked up more
> or less directly - the session error was at least the last item logged
> when the process was in the locked state. It has not got stuck in any
> other situations.
>
> Due to the way exception handling is implemented, there is a chance that
> there is another attempt at sending data after an exception from
> Sender::send(), without resetting the session or anything. On the other
> hand, a receive/nextReceiver() exception is in practice always followed
> directly by
>
> if(session.isValid() && session.hasError()) {
> Util::debug(1, "Reset AMQP session with error...");
> session.close();
> session=qpid::messaging::Session();
> }
>
> The same code is also executed eventually in the send case, but there
> may be other intervening Sender::send() calls, and also operations like
> Session::getSender() or even (at least in theory) Session::createSender().
Do multiple threads use the session concurrently (wondering if it is
possible that the session is deleted while a send call is in progress)?
I think for the 'locking up' some pstack output would be useful. For the
extra connections, some more logging if possible.
I can't see anything that would obviously cause a lock up in the send,
and if it is a heartbeat timeout that causes the reconnect I would
expect the socket to be closed.
Sorry we could not be more help at this point.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org
Re: [External] Re: "Session detached by peer"; extra connections on
broker, lock-up on send
Posted by Toralf Lund <to...@pgs.com>.
On 18/12/2019 12:35, Gordon Sim wrote:
> On 18/12/2019 10:42 am, Toralf Lund wrote:
>> On 18/12/2019 10:40, Gordon Sim wrote:
>>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>>> An additional, more serious issue is that the system has also
>>>> locked up a couple of times following an exception during
>>>> Sender::send(). Slightly simplified stack trace from one of the cases:
>>>>
>>>> Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
>>>> #0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>>> #1 0x00007fb93f9c7677 in
>>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #3 0x00007fb93f9b8e4a in ?? ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #4 0x00007fb941663dd5 in start_thread () from
>>>> /usr/lib64/libpthread.so.0
>>>> #5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>>
>>>> Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
>>>> #0 0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>> from /usr/lib64/libpthread.so.0
>>>> #1 0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #2 0x00007fb93f9b8e4a in ?? ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #3 0x00007fb941663dd5 in start_thread () from
>>>> /usr/lib64/libpthread.so.0
>>>> #4 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>>
>>>> Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
>>>> [ ... ]
>>>
>>> Thread 3's trace didn't make it through. Do you still have that?
>>>> Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
>>>> #0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>>> #1 0x00007fb93f9c7677 in
>>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #3 0x00007fb93f9b8e4a in ?? ()
>>>> from /usr/lib64/libqpidcommon.so.2
>>>> #4 0x00007fb941663dd5 in start_thread () from
>>>> /usr/lib64/libpthread.so.0
>>>> #5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>>
>>>> Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
>>>> #0 0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
>>>> #1 0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
>>>> #2 0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
>>>> #3 0x00007fb94039add7 in _Unwind_Resume () from
>>>> /usr/lib64/libgcc_s.so.1
>>>> #4 0x00007fb93fd8b927 in
>>>> qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody
>>>> const&, qpid::framing::MethodContent const*) ()
>>>> from /usr/lib64/libqpidclient.so.2
>>>> #5 0x00007fb93fd8b97b in
>>>> qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) ()
>>>> from /usr/lib64/libqpidclient.so.2
>>>> #6 0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
>>>> from /usr/lib64/libqpidclient.so.2
>>>> #7 0x00007fb9422403f8 in ?? ()
>>>> from /usr/lib64/libqpidmessaging.so.2
>>>> #8 0x00007fb942241e6a in ?? ()
>>>> from /usr/lib64/libqpidmessaging.so.2
>>>> #9 0x00000000004f2dc9 in ?? ()
>>>
>>> It looks like the send is stuck trying to unwind after an exception
>>> (possibly on the checkOpen()), due to not being able to unlock the
>>> sendLock semaphore. Is something in thread 3 holding any locks?
>>
>> It may be holding locks, but not anything related to QPid.
>> Specifically, it's probably waiting for a condition via
>> pthread_cond_wait(), which implies that it's also holding a mutex
>> associated with the condition variable.
>
> Ok, thread 1 may of course not be stuck at all, and it was just
> captured at that point. Getting pstack output separated by a few
> seconds would shed more light.
>
> Does the frequency of the 'locking up' match the frequency of the
> session-busy exceptions?
I get the exact behaviour I'm seeing now only in a version that was
deployed about a week ago, and since then, between 60 and 70
session-busy exceptions have been registered. Most of these came from
Session::nextReceiver() or Receiver::fetch(), but exactly 2 were raised
by Sender::send(). In both these cases, the application locked up more
or less directly - the session error was at least the last item logged
when the process was in the locked state. It has not got stuck in any
other situations.
Due to the way exception handling is implemented, there is a chance that
there is another attempt at sending data after an exception from
Sender::send(), without resetting the session or anything. On the other
hand, a receive/nextReceiver() exception is in practice always followed
directly by
if(session.isValid() && session.hasError()) {
Util::debug(1, "Reset AMQP session with error...");
session.close();
session=qpid::messaging::Session();
}
The same code is also executed eventually in the send case, but there
may be other intervening Sender::send() calls, and also operations like
Session::getSender() or even (at least in theory) Session::createSender().
- Toralf
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org
Re: [External] Re: "Session detached by peer"; extra connections on
broker, lock-up on send
Posted by Gordon Sim <gs...@redhat.com>.
On 18/12/2019 10:42 am, Toralf Lund wrote:
> On 18/12/2019 10:40, Gordon Sim wrote:
>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>> An additional, more serious issue is that the system has also locked
>>> up a couple of times following an exception during Sender::send().
>>> Slightly simplified stack trace from one of the cases:
>>>
>>> Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
>>> #0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>> #1 0x00007fb93f9c7677 in
>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #3 0x00007fb93f9b8e4a in ?? ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #4 0x00007fb941663dd5 in start_thread () from
>>> /usr/lib64/libpthread.so.0
>>> #5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>
>>> Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
>>> #0 0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> from /usr/lib64/libpthread.so.0
>>> #1 0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #2 0x00007fb93f9b8e4a in ?? ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #3 0x00007fb941663dd5 in start_thread () from
>>> /usr/lib64/libpthread.so.0
>>> #4 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>
>>> Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
>>> [ ... ]
>>
>> Thread 3's trace didn't make it through. Do you still have that?
>>> Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
>>> #0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>> #1 0x00007fb93f9c7677 in
>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #3 0x00007fb93f9b8e4a in ?? ()
>>> from /usr/lib64/libqpidcommon.so.2
>>> #4 0x00007fb941663dd5 in start_thread () from
>>> /usr/lib64/libpthread.so.0
>>> #5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>
>>> Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
>>> #0 0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
>>> #1 0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
>>> #2 0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
>>> #3 0x00007fb94039add7 in _Unwind_Resume () from
>>> /usr/lib64/libgcc_s.so.1
>>> #4 0x00007fb93fd8b927 in
>>> qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&,
>>> qpid::framing::MethodContent const*) ()
>>> from /usr/lib64/libqpidclient.so.2
>>> #5 0x00007fb93fd8b97b in
>>> qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) ()
>>> from /usr/lib64/libqpidclient.so.2
>>> #6 0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
>>> from /usr/lib64/libqpidclient.so.2
>>> #7 0x00007fb9422403f8 in ?? ()
>>> from /usr/lib64/libqpidmessaging.so.2
>>> #8 0x00007fb942241e6a in ?? ()
>>> from /usr/lib64/libqpidmessaging.so.2
>>> #9 0x00000000004f2dc9 in ?? ()
>>
>> It looks like the send is stuck trying to unwind after an exception
>> (possibly on the checkOpen()), due to not being able to unlock the
>> sendLock semaphore. Is something in thread 3 holding any locks?
>
> It may be holding locks, but not anything related to QPid. Specifically,
> it's probably waiting for a condition via pthread_cond_wait(), which
> implies that it's also holding a mutex associated with the condition
> variable.
Ok, thread 1 may of course not be stuck at all, and it was just captured
at that point. Getting pstack output separated by a few seconds would
shed more light.
Does the frequency of the 'locking up' match the frequency of the
session-busy exceptions?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org