You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by "yang.yang.zz" <ya...@outlook.com> on 2016/02/29 20:35:28 UTC

5.13.1 message blocked

Hi:

I'm running into an massive message blocking with ActiveMQ 5.13.1. This
issue is found during our reliance testing. This testing is set due to a
similar issue found in the field, where our customer is using 5.11.1 which
have encountered similar disconnection issues.

Here's our topology:

                  Consumer0
                        |
                    broker0

          /            |              \
         /      (network)          \
        /              |                 \
  broker1        broker2      ...   broker17          

  producer1     producer2   ...  producer17                


The issue we found here is, if, at broker0 side, if we set the firewall to
drop all the incoming messages from broker1 to broker17 for longer than 5
mins. Then remove the firewall setting. Then even we observed the connection
between broker0 and broker1-17 gets re-established. But they're no longer
communicating to each other. We have to restart the broker0 to get the
communication back.

Here're the abstracted configurations:

*broker0:*

        <transportConnectors>
            <transportConnector name="openwire" uri="tcp://0.0.0.0:61616"/>
        </transportConnectors>


*broker1-17:*

       <networkConnectors>
                  <networkConnector name="broker0"
uri="static:(tcp://broker0:61616)" duplex="true" />
        </networkConnectors>
      <transportConnectors>
            <transportConnector name="openwire"
uri="tcp://0.0.0.0:61616maximumConnections=100&amp;wireFormat.maxFrameSize=104857600"/>
        </transportConnectors>


Btw, this issue is not seen with fewer producers. say less than 5 producers.
But this issue is 100% reproducible with 17 producers.

I spent hours and tried different transportConnector and networkConnector
configuration options but it didn't help. Is this a configuration mistake or
a bug with 5.13.1? 

Many thanks,
-Yang







--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by "yang.yang.zz" <ya...@outlook.com>.
I downloaded the source code and tracked the code a little. I found the issue
could be here:


DemandForwardingBridgeSupport.java (line: 774)

            // in a cyclic network there can be multiple bridges per broker
that can propagate
            // a network subscription so there is a need to synchronize on a
shared entity
            synchronized (brokerService.getVmConnectorURI()) {
                addConsumerInfo(info);
            }

This synchronized block holds an java.net.URI object. And within the block,
it's sending a ConsumerAdvisory message, which is a blocking operation. It
will wait until the response get back until it quits the synchronized block.

However, in our test, we dropped responded message and the default
sendTimeout is 0 which is infinite. Therefore, this synchronized block will
never quit and in the meanwhile. since it holds the connector URI, it's
actually blocks all the sending message to this URI.

It seems configuring connection.sendTimeout should prevent this. But since
here we have a big block on URI level, should it have a non-0 default
sentTimeout?





--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592p4708819.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by "yang.yang.zz" <ya...@outlook.com>.
*## Update *

Unfortunately, even with NIO, I found the same blocking issue after adding
more data traffic...


"ActiveMQ NIO Worker 64" #842 prio=9 os_prio=0 tid=0x00007f1680010800
nid=0x3381 waiting for monitor entry [0x00007f16d06d1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at
org.apache.activemq.network.DemandForwardingBridgeSupport.serviceRemoteConsumerAdvisory(DemandForwardingBridgeSupport.java:775)
    - waiting to lock <0x000000068a566aa0> (a java.net.URI)
    at
org.apache.activemq.network.DemandForwardingBridgeSupport.serviceRemoteCommand(DemandForwardingBridgeSupport.java:623)
    at
org.apache.activemq.network.DemandForwardingBridgeSupport$3.onCommand(DemandForwardingBridgeSupport.java:225)
    at
org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
    at
org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
    at
org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:125)
    at
org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:300)
    at
org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
    at
org.apache.activemq.transport.nio.NIOTransport.serviceRead(NIOTransport.java:156)
    at
org.apache.activemq.transport.nio.NIOTransport$1.onSelect(NIOTransport.java:80)
    at
org.apache.activemq.transport.nio.SelectorSelection.onSelect(SelectorSelection.java:98)
    at
org.apache.activemq.transport.nio.SelectorWorker$1.run(SelectorWorker.java:118)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)




--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592p4708807.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by "yang.yang.zz" <ya...@outlook.com>.
*## Update*

Changing the transport protocol at consumer side seem to solve the issue.

*broker0*

        <transportConnectors>
            <transportConnector name=&quot;openwire&quot;
uri=&quot;&lt;b>nio*://0.0.0.0:61616"/>
        </transportConnectors>

It looks like a bug with *tcp* protocol. 



--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592p4708796.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by "yang.yang.zz" <ya...@outlook.com>.
The question is "what is 138.42.247.41" is waiting for? Even the response is
never coming back, why it's waiting forever and block other transport
threads. Is there a way to timeout this forever wait?



--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592p4708788.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by "yang.yang.zz" <ya...@outlook.com>.
More findings in the consumer broker0. Why some connections a blocked.

<http://activemq.2283324.n4.nabble.com/file/n4708787/blocked_connection.png> 



<http://activemq.2283324.n4.nabble.com/file/n4708787/waiting_connection.png> 



Here just give an example of a good connection for comparison
<http://activemq.2283324.n4.nabble.com/file/n4708787/good_connection.png> 



--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592p4708787.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by "yang.yang.zz" <ya...@outlook.com>.
Hi Tim

I collected 3 broker logs during the network blocking events. Which are
respectively: consumer broker log (broker0), and a good producer broker
(still connected after the network resumes) and a bad producer broker
(disconnected after network resumes)

consumer_broker.txt
<http://activemq.2283324.n4.nabble.com/file/n4708717/consumer_broker.txt>  
producer_bad_broker.txt
<http://activemq.2283324.n4.nabble.com/file/n4708717/producer_bad_broker.txt>  
producer_good_broker.txt
<http://activemq.2283324.n4.nabble.com/file/n4708717/producer_good_broker.txt>  

Thanks,



--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592p4708717.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by "yang.yang.zz" <ya...@outlook.com>.
Thanks Tim for responding..

I just run the same test again and happen to have some connected brokers and
disconnected brokers after the test. By looking at the thread comparison, I
see the difference between connected broker & disconnected (broken) brokers
is


<http://activemq.2283324.n4.nabble.com/file/n4708714/broker.png> 





--
View this message in context: http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592p4708714.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: 5.13.1 message blocked

Posted by Tim Bain <tb...@alumni.duke.edu>.
Tell us more about "they're no longer
communicating to each other."  In a JMX viewer, for a particular queue
(pick one, doesn't matter which) on each broker, which consumers are
present on that queue before and after the restart?

Also, what's in the logs of the brokers immediately before, during, and
after the firewall shutdown?

Tim
On Feb 29, 2016 2:56 PM, "yang.yang.zz" <ya...@outlook.com> wrote:

> Hi:
>
> I'm running into an massive message blocking with ActiveMQ 5.13.1. This
> issue is found during our reliance testing. This testing is set due to a
> similar issue found in the field, where our customer is using 5.11.1 which
> have encountered similar disconnection issues.
>
> Here's our topology:
>
>                   Consumer0
>                         |
>                     broker0
>
>           /            |              \
>          /      (network)          \
>         /              |                 \
>   broker1        broker2      ...   broker17
>
>   producer1     producer2   ...  producer17
>
>
> The issue we found here is, if, at broker0 side, if we set the firewall to
> drop all the incoming messages from broker1 to broker17 for longer than 5
> mins. Then remove the firewall setting. Then even we observed the
> connection
> between broker0 and broker1-17 gets re-established. But they're no longer
> communicating to each other. We have to restart the broker0 to get the
> communication back.
>
> Here're the abstracted configurations:
>
> *broker0:*
>
>         <transportConnectors>
>             <transportConnector name="openwire" uri="tcp://0.0.0.0:61616
> "/>
>         </transportConnectors>
>
>
> *broker1-17:*
>
>        <networkConnectors>
>                   <networkConnector name="broker0"
> uri="static:(tcp://broker0:61616)" duplex="true" />
>         </networkConnectors>
>       <transportConnectors>
>             <transportConnector name="openwire"
> uri="tcp://0.0.0.0:61616
> maximumConnections=100&amp;wireFormat.maxFrameSize=104857600"/>
>         </transportConnectors>
>
>
> Btw, this issue is not seen with fewer producers. say less than 5
> producers.
> But this issue is 100% reproducible with 17 producers.
>
> I spent hours and tried different transportConnector and networkConnector
> configuration options but it didn't help. Is this a configuration mistake
> or
> a bug with 5.13.1?
>
> Many thanks,
> -Yang
>
>
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/5-13-1-message-blocked-tp4708592.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>