You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@servicemix.apache.org by bgreer <br...@amd.com> on 2008/07/09 17:45:15 UTC

sendSync deadlock on downed endpoint

We seem to be experiencing an issue where ServiceMix hangs on shutdown with
inflight messages on a channel leading to a downed endpoint. This appears to
be caused by sendSync in a component (JMS in this case) waiting for a
message from a downstream endpoint that has already been shutdown. There are
lots of threads open during the hang, but here are some relevant stack
traces:

"ServiceMix ShutdownHook" prio=1 tid=0x08f0aa20 nid=0x39f0 in Object.wait()
[0xa
15b3000..0xa15b4020]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xb31e4b48> (a java.lang.Object)
        at java.lang.Object.wait(Object.java:474)
        at
org.springframework.jms.listener.DefaultMessageListenerContainer.doSh
utdown(DefaultMessageListenerContainer.java:768)
        - locked <0xb31e4b48> (a java.lang.Object)
        at
org.springframework.jms.listener.AbstractJmsListeningContainer.shutdo
wn(AbstractJmsListeningContainer.java:294)
        at
org.apache.servicemix.jms.endpoints.JmsConsumerEndpoint.stop(JmsConsu
merEndpoint.java:398)
        - locked <0xb31b2198> (a
org.apache.servicemix.jms.endpoints.JmsConsumer
Endpoint)
        at
org.apache.servicemix.common.endpoints.SimpleEndpoint.deactivate(Simp
leEndpoint.java:59)
        - locked <0xb31b2198> (a
org.apache.servicemix.jms.endpoints.JmsConsumer
Endpoint)
        at
org.apache.servicemix.common.ServiceUnit.stop(ServiceUnit.java:76)
...


"DefaultMessageListenerContainer-1" prio=1 tid=0xa60c20b8 nid=0x38a7 in
Object.w
ait() [0xa24fc000..0xa24fcf20]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xb4942fa8> (a
org.apache.servicemix.jbi.messaging.InOnlyI
mpl)
        at
org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.waitForExchan
ge(DeliveryChannelImpl.java:699)
        at
org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(Deli
veryChannelImpl.java:472)
        - locked <0xb4942fa8> (a
org.apache.servicemix.jbi.messaging.InOnlyImpl)
        at
org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(Deli
veryChannelImpl.java:442)
        at
org.apache.servicemix.common.EndpointDeliveryChannel.sendSync(Endpoin
tDeliveryChannel.java:95)
        at
org.apache.servicemix.common.endpoints.SimpleEndpoint.sendSync(Simple
Endpoint.java:71)
        at
org.apache.servicemix.jms.endpoints.AbstractConsumerEndpoint.onMessag
e(AbstractConsumerEndpoint.java:422)
        at
org.apache.servicemix.jms.endpoints.JmsConsumerEndpoint$1.onMessage(J
msConsumerEndpoint.java:387)
...

ServiceMix version: 3.2.1 (with some patches from the trunk)
JVM: 1.5

Just looking at the 3.2 code it seems like the appropriate fix is to close
any relevant channels on an endpoint when the endpoint is stopped. Since the
context owns the channel it seems like this would be the appropriate place
to close the channel, but in EndpointComponentContext it appears that
"deactivateEndpoint" is an unsupported operation. If this method is not
supposed to be used then perhaps the channel can be closed by the endpoint
itself (SimpleEndpoint.deactivate in this case) since it also maintains a
pointer to the channel. In either case I would propose:
if(channel != null)
   channel.close();

Let me know if this sounds reasonable, and I can create a Jira ticket with
the appropriate patch; otherwise, let me know if there's a better way to
handle this situation, or if we should just wait for the next release since
it looks like quite a bit has changed in the trunk.

Thanks,
Brian
-- 
View this message in context: http://www.nabble.com/sendSync-deadlock-on-downed-endpoint-tp18363507p18363507.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: sendSync deadlock on downed endpoint

Posted by Guillaume Nodet <gn...@gmail.com>.

This is now fixed in the component's tree.

On Thu, Jul 10, 2008 at 11:26 PM, bgreer <br...@amd.com> wrote:
>
> Created SM-1454 for this issue.
>
>
>
> gnodet wrote:
>>
>> Yes, please raise a JIRA and i'll backport the fix.
>>
>>
>> --
>> Cheers,
>> Guillaume Nodet
>> ------------------------
>> Blog: http://gnodet.blogspot.com/
>>
>>
>
> --
> View this message in context: http://www.nabble.com/sendSync-deadlock-on-downed-endpoint-tp18363507p18392218.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.
>
>



-- 
Cheers,
Guillaume Nodet
------------------------
Blog: http://gnodet.blogspot.com/

Re: sendSync deadlock on downed endpoint

Posted by bgreer <br...@amd.com>.

Created SM-1454 for this issue.



gnodet wrote:
> 
> Yes, please raise a JIRA and i'll backport the fix.
> 
> 
> -- 
> Cheers,
> Guillaume Nodet
> ------------------------
> Blog: http://gnodet.blogspot.com/
> 
> 

-- 
View this message in context: http://www.nabble.com/sendSync-deadlock-on-downed-endpoint-tp18363507p18392218.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: sendSync deadlock on downed endpoint

Posted by Guillaume Nodet <gn...@gmail.com>.

Yes, please raise a JIRA and i'll backport the fix.

On Thu, Jul 10, 2008 at 11:16 PM, bgreer <br...@amd.com> wrote:
>
> Thanks for the note on this. It looks like my problem may have been specific
> to the JMS component, however. I downloaded the fuse release (3.3.1.3-fuse)
> and was not able to reproduce the hang. I applied a few selective patches
> based on the diffs for that release and it looks like the following ticket
> fixed our hang issue: http://open.iona.com/issues/browse/ESB-95
>
> What's interesting is that in the trunk AbstractConsumerEndpoint the
> 'onMessage' function does not seem to have the exception handling fix that's
> in the IONA release. Here is the specific portion of the patch from
> 3.3.1.3-fuse to 3.2.1 for reference:
>
> ---
> deployables/bindingcomponents/servicemix-jms/src/main/java/org/apache/servicemix/jms/endpoints/AbstractConsumerEndpoint.java
> (revision 3799)
> +++
> deployables/bindingcomponents/servicemix-jms/src/main/java/org/apache/servicemix/jms/endpoints/AbstractConsumerEndpoint.java
> (working copy)
> @@ -413,9 +418,13 @@
>         if (logger.isTraceEnabled()) {
>             logger.trace("Received: " + jmsMessage);
>         }
> +
> +        JmsContext context = null;
> +        MessageExchange exchange = null;
> +
>         try {
> -            JmsContext context = marshaler.createContext(jmsMessage);
> -            MessageExchange exchange = marshaler.createExchange(context,
> getContext());
> +            context = marshaler.createContext(jmsMessage);
> +            exchange = marshaler.createExchange(context, getContext());
>             configureExchangeTarget(exchange);
>             if (synchronous) {
>                 try {
> @@ -444,10 +453,12 @@
>                     }
>                 }
>             }
> -        } catch (JMSException e) {
> -            throw e;
>         } catch (Exception e) {
> -            throw (JMSException) new JMSException("Error sending JBI
> exchange").initCause(e);
> +            try {
> +                handleException(exchange, e, session, context);
> +            } catch (Exception e1) {
> +                throw (JMSException) new JMSException("Error sending JBI
> exchange").initCause(e);
> +            }
>         }
>     }
>
> Let me know if it makes sense to apply this to the trunk as well, and I can
> create a ticket for this issue.
>
> Thanks,
> Brian
>
>
> gnodet wrote:
>>
>> Well, i'm currently working on a related issue.  The first thing is to
>> change the order in which services are shutdown in
>> JBIContainer#shutDown.
>> Especially, the broker / registry order is wrong (even if there is  a
>> comment saying otherwise).  The reason is that the jms listener is
>> waiting for all threads to be finished before actually shutting down
>> and this can't really happen if the registry is shutdown before,
>> because all exchanges won't be processed correctly.
>>
>> --
>> Cheers,
>> Guillaume Nodet
>> ------------------------
>> Blog: http://gnodet.blogspot.com/
>>
>>
>
> --
> View this message in context: http://www.nabble.com/sendSync-deadlock-on-downed-endpoint-tp18363507p18392054.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.
>
>



-- 
Cheers,
Guillaume Nodet
------------------------
Blog: http://gnodet.blogspot.com/

Re: sendSync deadlock on downed endpoint

Posted by bgreer <br...@amd.com>.

Thanks for the note on this. It looks like my problem may have been specific
to the JMS component, however. I downloaded the fuse release (3.3.1.3-fuse)
and was not able to reproduce the hang. I applied a few selective patches
based on the diffs for that release and it looks like the following ticket
fixed our hang issue: http://open.iona.com/issues/browse/ESB-95

What's interesting is that in the trunk AbstractConsumerEndpoint the
'onMessage' function does not seem to have the exception handling fix that's
in the IONA release. Here is the specific portion of the patch from
3.3.1.3-fuse to 3.2.1 for reference:

---
deployables/bindingcomponents/servicemix-jms/src/main/java/org/apache/servicemix/jms/endpoints/AbstractConsumerEndpoint.java
(revision 3799)
+++
deployables/bindingcomponents/servicemix-jms/src/main/java/org/apache/servicemix/jms/endpoints/AbstractConsumerEndpoint.java
(working copy)
@@ -413,9 +418,13 @@
         if (logger.isTraceEnabled()) {
             logger.trace("Received: " + jmsMessage);
         }
+        
+        JmsContext context = null;
+        MessageExchange exchange = null;
+        
         try {
-            JmsContext context = marshaler.createContext(jmsMessage);
-            MessageExchange exchange = marshaler.createExchange(context,
getContext());
+            context = marshaler.createContext(jmsMessage);
+            exchange = marshaler.createExchange(context, getContext());
             configureExchangeTarget(exchange);
             if (synchronous) {
                 try {
@@ -444,10 +453,12 @@
                     }
                 }
             }
-        } catch (JMSException e) {
-            throw e;
         } catch (Exception e) {
-            throw (JMSException) new JMSException("Error sending JBI
exchange").initCause(e);
+            try {
+                handleException(exchange, e, session, context);
+            } catch (Exception e1) {
+                throw (JMSException) new JMSException("Error sending JBI
exchange").initCause(e);
+            }
         }
     }

Let me know if it makes sense to apply this to the trunk as well, and I can
create a ticket for this issue.

Thanks,
Brian


gnodet wrote:
> 
> Well, i'm currently working on a related issue.  The first thing is to
> change the order in which services are shutdown in
> JBIContainer#shutDown.
> Especially, the broker / registry order is wrong (even if there is  a
> comment saying otherwise).  The reason is that the jms listener is
> waiting for all threads to be finished before actually shutting down
> and this can't really happen if the registry is shutdown before,
> because all exchanges won't be processed correctly.
> 
> -- 
> Cheers,
> Guillaume Nodet
> ------------------------
> Blog: http://gnodet.blogspot.com/
> 
> 

-- 
View this message in context: http://www.nabble.com/sendSync-deadlock-on-downed-endpoint-tp18363507p18392054.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.

Re: sendSync deadlock on downed endpoint

Posted by Guillaume Nodet <gn...@gmail.com>.

Well, i'm currently working on a related issue.  The first thing is to
change the order in which services are shutdown in
JBIContainer#shutDown.
Especially, the broker / registry order is wrong (even if there is  a
comment saying otherwise).  The reason is that the jms listener is
waiting for all threads to be finished before actually shutting down
and this can't really happen if the registry is shutdown before,
because all exchanges won't be processed correctly.

On Wed, Jul 9, 2008 at 5:45 PM, bgreer <br...@amd.com> wrote:
>
> We seem to be experiencing an issue where ServiceMix hangs on shutdown with
> inflight messages on a channel leading to a downed endpoint. This appears to
> be caused by sendSync in a component (JMS in this case) waiting for a
> message from a downstream endpoint that has already been shutdown. There are
> lots of threads open during the hang, but here are some relevant stack
> traces:
>
> "ServiceMix ShutdownHook" prio=1 tid=0x08f0aa20 nid=0x39f0 in Object.wait()
> [0xa
> 15b3000..0xa15b4020]
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0xb31e4b48> (a java.lang.Object)
>        at java.lang.Object.wait(Object.java:474)
>        at
> org.springframework.jms.listener.DefaultMessageListenerContainer.doSh
> utdown(DefaultMessageListenerContainer.java:768)
>        - locked <0xb31e4b48> (a java.lang.Object)
>        at
> org.springframework.jms.listener.AbstractJmsListeningContainer.shutdo
> wn(AbstractJmsListeningContainer.java:294)
>        at
> org.apache.servicemix.jms.endpoints.JmsConsumerEndpoint.stop(JmsConsu
> merEndpoint.java:398)
>        - locked <0xb31b2198> (a
> org.apache.servicemix.jms.endpoints.JmsConsumer
> Endpoint)
>        at
> org.apache.servicemix.common.endpoints.SimpleEndpoint.deactivate(Simp
> leEndpoint.java:59)
>        - locked <0xb31b2198> (a
> org.apache.servicemix.jms.endpoints.JmsConsumer
> Endpoint)
>        at
> org.apache.servicemix.common.ServiceUnit.stop(ServiceUnit.java:76)
> ...
>
>
> "DefaultMessageListenerContainer-1" prio=1 tid=0xa60c20b8 nid=0x38a7 in
> Object.w
> ait() [0xa24fc000..0xa24fcf20]
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0xb4942fa8> (a
> org.apache.servicemix.jbi.messaging.InOnlyI
> mpl)
>        at
> org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.waitForExchan
> ge(DeliveryChannelImpl.java:699)
>        at
> org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(Deli
> veryChannelImpl.java:472)
>        - locked <0xb4942fa8> (a
> org.apache.servicemix.jbi.messaging.InOnlyImpl)
>        at
> org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(Deli
> veryChannelImpl.java:442)
>        at
> org.apache.servicemix.common.EndpointDeliveryChannel.sendSync(Endpoin
> tDeliveryChannel.java:95)
>        at
> org.apache.servicemix.common.endpoints.SimpleEndpoint.sendSync(Simple
> Endpoint.java:71)
>        at
> org.apache.servicemix.jms.endpoints.AbstractConsumerEndpoint.onMessag
> e(AbstractConsumerEndpoint.java:422)
>        at
> org.apache.servicemix.jms.endpoints.JmsConsumerEndpoint$1.onMessage(J
> msConsumerEndpoint.java:387)
> ...
>
> ServiceMix version: 3.2.1 (with some patches from the trunk)
> JVM: 1.5
>
> Just looking at the 3.2 code it seems like the appropriate fix is to close
> any relevant channels on an endpoint when the endpoint is stopped. Since the
> context owns the channel it seems like this would be the appropriate place
> to close the channel, but in EndpointComponentContext it appears that
> "deactivateEndpoint" is an unsupported operation. If this method is not
> supposed to be used then perhaps the channel can be closed by the endpoint
> itself (SimpleEndpoint.deactivate in this case) since it also maintains a
> pointer to the channel. In either case I would propose:
> if(channel != null)
>   channel.close();
>
> Let me know if this sounds reasonable, and I can create a Jira ticket with
> the appropriate patch; otherwise, let me know if there's a better way to
> handle this situation, or if we should just wait for the next release since
> it looks like quite a bit has changed in the trunk.
>
> Thanks,
> Brian
> --
> View this message in context: http://www.nabble.com/sendSync-deadlock-on-downed-endpoint-tp18363507p18363507.html
> Sent from the ServiceMix - User mailing list archive at Nabble.com.
>
>



-- 
Cheers,
Guillaume Nodet
------------------------
Blog: http://gnodet.blogspot.com/