You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by Toralf Lund <to...@pgs.com> on 2019/12/18 10:42:15 UTC

Re: [External] Re: "Session detached by peer"; extra connections on broker, lock-up on send

On 18/12/2019 10:40, Gordon Sim wrote:
> On 18/12/2019 8:45 am, Toralf Lund wrote:
>> On one of our installations of one of our AMQP 0-10 applications 
>> using the QPid messaging API (in C++), we keep getting exceptions 
>> with the message
>>
>> session-busy: Session detached by peer
>>
>> - usually while waiting for incoming messages, but in some cases also 
>> when sending them. After resolving some issues with "dangling" 
>> session pointers (with help from the list), I thought this wasn't 
>> really a problem, but I just found that the broker has several extra 
>> connections for the application; qpid-stat reports the following:
>>
>> $ qpid-stat -c
>> Connections
>>    connection                                  cproc cpid    
>> mech       auth       connected        idle msgIn  msgOut
>> =========================================================================================================================================== 
>>
>>    qpid.192.168.0.215:5672-192.168.0.61:58532  our-program 134233  
>> ANONYMOUS  anonymous  4d 19h 42m 25s   4d 18h 37m 44s 47.3k   531k
>>    qpid.192.168.0.215:5672-192.168.0.61:58564  our-program 134233  
>> ANONYMOUS  anonymous  4d 5h 27m 9s     4d 4h 32m 4s 42.3k   488k
>>    qpid.192.168.0.215:5672-192.168.0.61:58566  our-program 134233  
>> ANONYMOUS  anonymous  4d 4h 32m 12s    4d 3h 33m 34s 44.6k   513k
>>    qpid.192.168.0.215:5672-192.168.0.61:58580  our-program 134233  
>> ANONYMOUS  anonymous  1d 17h 37m 49s   1d 16h 41m 54s 40.6k   515k
>>    qpid.192.168.0.215:5672-192.168.0.61:58582  our-program 134233  
>> ANONYMOUS  anonymous  1d 16h 42m 1s    1d 15h 46m 44s 41.3k   497k
>>    qpid.192.168.0.215:5672-192.168.0.61:58584  our-program 134233  
>> ANONYMOUS  anonymous  1d 15h 46m 46s   1d 14h 43m 24s 46.8k   535k
>>    qpid.192.168.0.215:5672-192.168.0.61:58618  our-program 134233  
>> ANONYMOUS  anonymous  22h 55m 50s      21h 48m 4s 44.5k   496k
>>    qpid.192.168.0.215:5672-192.168.0.61:58846  our-program 134233  
>> ANONYMOUS  anonymous  18h 21m 22s 0s                 0      0
>>
>> (Names and addresses have been edited.)
>>
>> The implied times seem to match instances of the session-busy 
>> exception, but a new connection doesn't appear every time. The 
>> exception has occurred 27 times during the last 4 days, and as you 
>> can see, there are a not that many connections.
>
> It is possible some connections are actually closed. The session-busy 
> can be a race, e.g. if the client times out a connection before the 
> server (but the original connection and session is then eventually 
> closed, just too late for the reattach attempt to succeed with the 
> same session name).
OK.
>
> Your heartbeat interval is set quite low. I don't know if that was 
> deliberately so, but a slightly less sensitive setting might avoid 
> timing out in some cases.
I must admit I don't remember why the interval was set the way it was. 
We're using 5 or 10 in some other cases. I considered trying a slightly 
larger value, but got a bit worried that it would just mask the problem 
rather than really solving it.
>
> That said you still clearly also still have multiple connections open, 
> which I'm assuming is not expected? I.e. you would expect only one 
> connection from that process?
Yes, exactly.
>
>> I don't see any signs of connection errors or explicit reopens but 
>> automatic reconnect with a limit has been enabled; the following 
>> connect options are used
>>
>>    connection.setOption("reconnect", true);
>>    connection.setOption("reconnect_limit", 0);
>>
>> (I believe this means trying reconnect once, as "reconnect_limit" 
>> specifies retries after an initial attempt.)
>
> I believe a reconnect_limit of 0 will reconnect without any limit. If 
> you want to only reconnect once, you would specify a value of 1. 
> However that would I believe be reset every time it does successfully 
> reconnect.

I've assumed the following constructs in 
ConnectionImpl::ConnectionImpl() mean that a value of 1 actually gives 2 
attempts:

     for (double i = minReconnectInterval; !tryConnect(); i = 
std::min(i*2, maxReconnectInterval)) {

[ ... ]

         if (limit >= 0 && retries++ >= limit) {
             throw qpid::messaging::TransportFailure("Failed to connect 
within reconnect limit");
         }

[ ... ]

     }
     QPID_LOG(debug, "Connection successful, urls=" << asString(urls));
     retries = 0;

(Notice how there is one tryConnect() before anything is checked, and 
that limit check has >= and *post" increment of "retries".) But perhaps 
I read that the wrong way?

>
>>
>> An additional, more serious issue is that the system has also locked 
>> up a couple of times following an exception during Sender::send(). 
>> Slightly simplified stack trace from one of the cases:
>>
>> Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
>> #0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>> #1  0x00007fb93f9c7677 in 
>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #3  0x00007fb93f9b8e4a in ?? ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #4  0x00007fb941663dd5 in start_thread () from 
>> /usr/lib64/libpthread.so.0
>> #5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>
>> Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
>> #0  0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>     from /usr/lib64/libpthread.so.0
>> #1  0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #2  0x00007fb93f9b8e4a in ?? ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #3  0x00007fb941663dd5 in start_thread () from 
>> /usr/lib64/libpthread.so.0
>> #4  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>
>> Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
>> [ ... ]
>
> Thread 3's trace didn't make it through. Do you still have that?
>> Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
>> #0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>> #1  0x00007fb93f9c7677 in 
>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #3  0x00007fb93f9b8e4a in ?? ()
>>     from /usr/lib64/libqpidcommon.so.2
>> #4  0x00007fb941663dd5 in start_thread () from 
>> /usr/lib64/libpthread.so.0
>> #5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>
>> Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
>> #0  0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
>> #1  0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
>> #2  0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
>> #3  0x00007fb94039add7 in _Unwind_Resume () from 
>> /usr/lib64/libgcc_s.so.1
>> #4  0x00007fb93fd8b927 in 
>> qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&, 
>> qpid::framing::MethodContent const*) ()
>>     from /usr/lib64/libqpidclient.so.2
>> #5  0x00007fb93fd8b97b in 
>> qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) () 
>> from /usr/lib64/libqpidclient.so.2
>> #6  0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
>>     from /usr/lib64/libqpidclient.so.2
>> #7  0x00007fb9422403f8 in ?? ()
>>     from /usr/lib64/libqpidmessaging.so.2
>> #8  0x00007fb942241e6a in ?? ()
>>     from /usr/lib64/libqpidmessaging.so.2
>> #9  0x00000000004f2dc9 in ?? ()
>
> It looks like the send is stuck trying to unwind after an exception 
> (possibly on the checkOpen()), due to not being able to unlock the 
> sendLock semaphore. Is something in thread 3 holding any locks?

It may be holding locks, but not anything related to QPid. Specifically, 
it's probably waiting for a condition via pthread_cond_wait(), which 
implies that it's also holding a mutex associated with the condition 
variable.

>
>> The application became responsive again after running a debug session 
>> to get the above info. There is a chance that it did another 
>> Sender::send() on the same sender (without trying to reset the 
>> session or whatever) following the exception.
>>
>> Any idea what's going on here? Is there anything I can do to avoid 
>> the session-busy errors? Can I prevent extra connections from 
>> appearing if it should occur after all? Also, is there a way to 
>> prevent the apparent lock-ups after exception on send()? I think I've 
>> assumed that it's "safe" to try other operations as they will just 
>> fail directly, unless the session is actually able to recover, but 
>> perhaps that's a bad idea?
>>
>> Note that besides what's been mentioned already, the following 
>> connection options are used:
>>
>> connection.setOption("tcp_nodelay", true);
>> connection.setOption("heartbeat", 2);
>>
>> Thanks,
>>
>> - Toralf
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: [External] Re: "Session detached by peer"; extra connections on broker, lock-up on send

Posted by Toralf Lund <to...@pgs.com>.

On 18/12/2019 12:32, Gordon Sim wrote:
> On 18/12/2019 10:42 am, Toralf Lund wrote:
>> On 18/12/2019 10:40, Gordon Sim wrote:
>>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>>> I don't see any signs of connection errors or explicit reopens but 
>>>> automatic reconnect with a limit has been enabled; the following 
>>>> connect options are used
>>>>
>>>>    connection.setOption("reconnect", true);
>>>>    connection.setOption("reconnect_limit", 0);
>>>>
>>>> (I believe this means trying reconnect once, as "reconnect_limit" 
>>>> specifies retries after an initial attempt.)
>>>
>>> I believe a reconnect_limit of 0 will reconnect without any limit. 
>>> If you want to only reconnect once, you would specify a value of 1. 
>>> However that would I believe be reset every time it does 
>>> successfully reconnect.
>>
>> I've assumed the following constructs in 
>> ConnectionImpl::ConnectionImpl() mean that a value of 1 actually 
>> gives 2 attempts:
>>
>>      for (double i = minReconnectInterval; !tryConnect(); i = 
>> std::min(i*2, maxReconnectInterval)) {
>>
>> [ ... ]
>>
>>          if (limit >= 0 && retries++ >= limit) {
>>              throw qpid::messaging::TransportFailure("Failed to 
>> connect within reconnect limit");
>>          }
>>
>> [ ... ]
>>
>>      }
>>      QPID_LOG(debug, "Connection successful, urls=" << asString(urls));
>>      retries = 0;
>>
>> (Notice how there is one tryConnect() before anything is checked, and 
>> that limit check has >= and *post" increment of "retries".) But 
>> perhaps I read that the wrong way?
>
> No, you are right. What level of logging do you have on the client 
> side? Info level would have 'trying to connect' and 'connected to' 
> messages.

I currently only have "error+" logging. It might be possible to restart 
with a different level.

>
> The heartbeat firing logs only at debug level unfortunately from what 
> I can see.
>
> Is there anything in your code that could be opening connections?

It actually has

   if(!connection.isOpen()) {
     Util::debug(1, "Open AMQP connection...");

     try {
       connection.open();

[ ... ]

in a function that's executed periodically, but I see only one instance 
of the "Open AMQP connection..." message in my logs. (Util::debug() is 
an in-house logging function, which is in this case expected to write 
the given strings to a certain log file.)

I can't think of anything else that might open connections.

- Toralf

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: [External] Re: "Session detached by peer"; extra connections on broker, lock-up on send

Posted by Gordon Sim <gs...@redhat.com>.

On 18/12/2019 10:42 am, Toralf Lund wrote:
> On 18/12/2019 10:40, Gordon Sim wrote:
>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>> I don't see any signs of connection errors or explicit reopens but 
>>> automatic reconnect with a limit has been enabled; the following 
>>> connect options are used
>>>
>>>    connection.setOption("reconnect", true);
>>>    connection.setOption("reconnect_limit", 0);
>>>
>>> (I believe this means trying reconnect once, as "reconnect_limit" 
>>> specifies retries after an initial attempt.)
>>
>> I believe a reconnect_limit of 0 will reconnect without any limit. If 
>> you want to only reconnect once, you would specify a value of 1. 
>> However that would I believe be reset every time it does successfully 
>> reconnect.
> 
> I've assumed the following constructs in 
> ConnectionImpl::ConnectionImpl() mean that a value of 1 actually gives 2 
> attempts:
> 
>      for (double i = minReconnectInterval; !tryConnect(); i = 
> std::min(i*2, maxReconnectInterval)) {
> 
> [ ... ]
> 
>          if (limit >= 0 && retries++ >= limit) {
>              throw qpid::messaging::TransportFailure("Failed to connect 
> within reconnect limit");
>          }
> 
> [ ... ]
> 
>      }
>      QPID_LOG(debug, "Connection successful, urls=" << asString(urls));
>      retries = 0;
> 
> (Notice how there is one tryConnect() before anything is checked, and 
> that limit check has >= and *post" increment of "retries".) But perhaps 
> I read that the wrong way?

No, you are right. What level of logging do you have on the client side? 
Info level would have 'trying to connect' and 'connected to' messages.

The heartbeat firing logs only at debug level unfortunately from what I 
can see.

Is there anything in your code that could be opening connections?


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: [External] Re: "Session detached by peer"; extra connections on broker, lock-up on send

Posted by Gordon Sim <gs...@redhat.com>.

On 18/12/2019 12:33 pm, Toralf Lund wrote:
> I get the exact behaviour I'm seeing now only in a version that was 
> deployed about a week ago, and since then, between 60 and 70 
> session-busy exceptions have been registered. Most of these came from 
> Session::nextReceiver() or Receiver::fetch(), but exactly 2 were raised 
> by Sender::send(). In both these cases, the application locked up more 
> or less directly - the session error was at least the last item logged 
> when the process was in the locked state. It has not got stuck in any 
> other situations.
> 
> Due to the way exception handling is implemented, there is a chance that 
> there is another attempt at sending data after an exception from 
> Sender::send(), without resetting the session or anything. On the other 
> hand, a receive/nextReceiver() exception is in practice always followed 
> directly by
> 
>    if(session.isValid() && session.hasError()) {
>      Util::debug(1, "Reset AMQP session with error...");
>      session.close();
>      session=qpid::messaging::Session();
>    }
> 
> The same code is also executed eventually in the send case, but there 
> may be other intervening Sender::send() calls, and also operations like 
> Session::getSender() or even (at least in theory) Session::createSender().

Do multiple threads use the session concurrently (wondering if it is 
possible that the session is deleted while a send call is in progress)?

I think for the 'locking up' some pstack output would be useful. For the 
extra connections, some more logging if possible.

I can't see anything that would obviously cause a lock up in the send, 
and if it is a heartbeat timeout that causes the reconnect I would 
expect the socket to be closed.

Sorry we could not be more help at this point.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: [External] Re: "Session detached by peer"; extra connections on broker, lock-up on send

Posted by Toralf Lund <to...@pgs.com>.

On 18/12/2019 12:35, Gordon Sim wrote:
> On 18/12/2019 10:42 am, Toralf Lund wrote:
>> On 18/12/2019 10:40, Gordon Sim wrote:
>>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>>> An additional, more serious issue is that the system has also 
>>>> locked up a couple of times following an exception during 
>>>> Sender::send(). Slightly simplified stack trace from one of the cases:
>>>>
>>>> Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
>>>> #0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>>> #1  0x00007fb93f9c7677 in 
>>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #3  0x00007fb93f9b8e4a in ?? ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #4  0x00007fb941663dd5 in start_thread () from 
>>>> /usr/lib64/libpthread.so.0
>>>> #5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>>
>>>> Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
>>>> #0  0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>>     from /usr/lib64/libpthread.so.0
>>>> #1  0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #2  0x00007fb93f9b8e4a in ?? ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #3  0x00007fb941663dd5 in start_thread () from 
>>>> /usr/lib64/libpthread.so.0
>>>> #4  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>>
>>>> Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
>>>> [ ... ]
>>>
>>> Thread 3's trace didn't make it through. Do you still have that?
>>>> Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
>>>> #0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>>> #1  0x00007fb93f9c7677 in 
>>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #3  0x00007fb93f9b8e4a in ?? ()
>>>>     from /usr/lib64/libqpidcommon.so.2
>>>> #4  0x00007fb941663dd5 in start_thread () from 
>>>> /usr/lib64/libpthread.so.0
>>>> #5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>>
>>>> Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
>>>> #0  0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
>>>> #1  0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
>>>> #2  0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
>>>> #3  0x00007fb94039add7 in _Unwind_Resume () from 
>>>> /usr/lib64/libgcc_s.so.1
>>>> #4  0x00007fb93fd8b927 in 
>>>> qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody 
>>>> const&, qpid::framing::MethodContent const*) ()
>>>>     from /usr/lib64/libqpidclient.so.2
>>>> #5  0x00007fb93fd8b97b in 
>>>> qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) () 
>>>> from /usr/lib64/libqpidclient.so.2
>>>> #6  0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
>>>>     from /usr/lib64/libqpidclient.so.2
>>>> #7  0x00007fb9422403f8 in ?? ()
>>>>     from /usr/lib64/libqpidmessaging.so.2
>>>> #8  0x00007fb942241e6a in ?? ()
>>>>     from /usr/lib64/libqpidmessaging.so.2
>>>> #9  0x00000000004f2dc9 in ?? ()
>>>
>>> It looks like the send is stuck trying to unwind after an exception 
>>> (possibly on the checkOpen()), due to not being able to unlock the 
>>> sendLock semaphore. Is something in thread 3 holding any locks?
>>
>> It may be holding locks, but not anything related to QPid. 
>> Specifically, it's probably waiting for a condition via 
>> pthread_cond_wait(), which implies that it's also holding a mutex 
>> associated with the condition variable.
>
> Ok, thread 1 may of course not be stuck at all, and it was just 
> captured at that point. Getting pstack output separated by a few 
> seconds would shed more light.
>
> Does the frequency of the 'locking up' match the frequency of the 
> session-busy exceptions?

I get the exact behaviour I'm seeing now only in a version that was 
deployed about a week ago, and since then, between 60 and 70 
session-busy exceptions have been registered. Most of these came from 
Session::nextReceiver() or Receiver::fetch(), but exactly 2 were raised 
by Sender::send(). In both these cases, the application locked up more 
or less directly - the session error was at least the last item logged 
when the process was in the locked state. It has not got stuck in any 
other situations.

Due to the way exception handling is implemented, there is a chance that 
there is another attempt at sending data after an exception from 
Sender::send(), without resetting the session or anything. On the other 
hand, a receive/nextReceiver() exception is in practice always followed 
directly by

   if(session.isValid() && session.hasError()) {
     Util::debug(1, "Reset AMQP session with error...");
     session.close();
     session=qpid::messaging::Session();
   }

The same code is also executed eventually in the send case, but there 
may be other intervening Sender::send() calls, and also operations like 
Session::getSender() or even (at least in theory) Session::createSender().

- Toralf


>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: [External] Re: "Session detached by peer"; extra connections on broker, lock-up on send

Posted by Gordon Sim <gs...@redhat.com>.

On 18/12/2019 10:42 am, Toralf Lund wrote:
> On 18/12/2019 10:40, Gordon Sim wrote:
>> On 18/12/2019 8:45 am, Toralf Lund wrote:
>>> An additional, more serious issue is that the system has also locked 
>>> up a couple of times following an exception during Sender::send(). 
>>> Slightly simplified stack trace from one of the cases:
>>>
>>> Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
>>> #0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>> #1  0x00007fb93f9c7677 in 
>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #3  0x00007fb93f9b8e4a in ?? ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #4  0x00007fb941663dd5 in start_thread () from 
>>> /usr/lib64/libpthread.so.0
>>> #5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>
>>> Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
>>> #0  0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>     from /usr/lib64/libpthread.so.0
>>> #1  0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #2  0x00007fb93f9b8e4a in ?? ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #3  0x00007fb941663dd5 in start_thread () from 
>>> /usr/lib64/libpthread.so.0
>>> #4  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>
>>> Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
>>> [ ... ]
>>
>> Thread 3's trace didn't make it through. Do you still have that?
>>> Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
>>> #0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
>>> #1  0x00007fb93f9c7677 in 
>>> qpid::sys::Poller::wait(qpid::sys::Duration) ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #3  0x00007fb93f9b8e4a in ?? ()
>>>     from /usr/lib64/libqpidcommon.so.2
>>> #4  0x00007fb941663dd5 in start_thread () from 
>>> /usr/lib64/libpthread.so.0
>>> #5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
>>>
>>> Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
>>> #0  0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
>>> #1  0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
>>> #2  0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
>>> #3  0x00007fb94039add7 in _Unwind_Resume () from 
>>> /usr/lib64/libgcc_s.so.1
>>> #4  0x00007fb93fd8b927 in 
>>> qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&, 
>>> qpid::framing::MethodContent const*) ()
>>>     from /usr/lib64/libqpidclient.so.2
>>> #5  0x00007fb93fd8b97b in 
>>> qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) () 
>>> from /usr/lib64/libqpidclient.so.2
>>> #6  0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
>>>     from /usr/lib64/libqpidclient.so.2
>>> #7  0x00007fb9422403f8 in ?? ()
>>>     from /usr/lib64/libqpidmessaging.so.2
>>> #8  0x00007fb942241e6a in ?? ()
>>>     from /usr/lib64/libqpidmessaging.so.2
>>> #9  0x00000000004f2dc9 in ?? ()
>>
>> It looks like the send is stuck trying to unwind after an exception 
>> (possibly on the checkOpen()), due to not being able to unlock the 
>> sendLock semaphore. Is something in thread 3 holding any locks?
> 
> It may be holding locks, but not anything related to QPid. Specifically, 
> it's probably waiting for a condition via pthread_cond_wait(), which 
> implies that it's also holding a mutex associated with the condition 
> variable.

Ok, thread 1 may of course not be stuck at all, and it was just captured 
at that point. Getting pstack output separated by a few seconds would 
shed more light.

Does the frequency of the 'locking up' match the frequency of the 
session-busy exceptions?


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org