You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Alexander Kosarev (JIRA)" <ji...@apache.org> on 2018/11/23 10:13:00 UTC

[jira] [Commented] (ARTEMIS-1864) On-Demand Message Redistribution Can Spontaneously Start Failing in Single Direction

    [ https://issues.apache.org/jira/browse/ARTEMIS-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696622#comment-16696622 ] 

Alexander Kosarev commented on ARTEMIS-1864:
--------------------------------------------

We have the same issue with both Apache Artemis 2.6.3 and JBoss MQ 7 (build 2.6.3.redhat-00004).

Cluster configuration:

2 nodes with configuration:

 
{code:java}
<configuration>

<connectors>
    <connector name="node01-connector">tcp://192.167.1.10:61616</connector>
</connectors>

<cluster-user>admin</cluster-user>
<cluster-password>admin</cluster-password>

<broadcast-groups>
    <broadcast-group name="my-broadcast-group">
        <group-address>${udp-address:231.7.7.7}</group-address>
        <group-port>9876</group-port>
        <broadcast-period>100</broadcast-period>
        <connector-ref>node02-connector</connector-ref>
    </broadcast-group>
</broadcast-groups>

<discovery-groups>
    <discovery-group name="my-discovery-group">
        <group-address>${udp-address:231.7.7.7}</group-address>
        <group-port>9876</group-port>
        <refresh-timeout>10000</refresh-timeout>
    </discovery-group>
</discovery-groups>

<cluster-connections>
    <cluster-connection name="sandbox-cluster">
        <connector-ref>node02-connector</connector-ref>
        <use-duplicate-detection>true</use-duplicate-detection>
        <max-hops>1</max-hops>
        <discovery-group-ref discovery-group-name="my-discovery-group"/>
    </cluster-connection>
</cluster-connections>

    <address-settings>
        <address-setting match="#">
            <redistribution-delay>0</redistribution-delay>
        </address-setting>
    </address-settings>

</configuration>

{code}
There are multiple ActiveMQ 5.15.6 JMS clients configured with failover-transport:

 
{code:java}
failover://(tcp://host1:port,tcp://host2:port){code}
All clients can consume and produce messages.

 

 

Messages stick in queue *$.artemis.internal.sf.sandbox-cluster.CLUSTER_NAME.NODE_NAME* at some point in "Delivering" state with following exception:

 
{code:java}
2018-11-23 14:36:25,274 WARN [org.apache.activemq.artemis.core.server] AMQ222151: removing consumer which did not handle a message, consumer=ClusterConnectionBridge@299a7489 [name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e, queue=QueueImpl
[name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=b42fe4db-e636-11e8-b335-6aabda98944e], temp=false]@1584523b targetConnector=ServerLocatorImpl (identity=(Cluster-connection-b
ridge::ClusterConnectionBridge@299a7489 [name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e, queue=QueueImpl[name=$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e, postOffice=PostOfficeImpl [server=ActiveMQServerImpl:
:serverUUID=b42fe4db-e636-11e8-b335-6aabda98944e], temp=false]@1584523b targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=node02-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61716&host
=10-145-13-120], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@638169719[nodeUUID=b42fe4db-e636-11e8-b335-6aabda98944e, connector=TransportConfiguration(name=node01-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)
?port=61616&host=10-145-13-120, address=, server=ActiveMQServerImpl::serverUUID=b42fe4db-e636-11e8-b335-6aabda98944e])) [initialConnectors=[TransportConfiguration(name=node02-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?p
ort=61716&host=10-145-13-120], discoveryGroupConfiguration=null]], message=Reference[2151226563]:NON-RELIABLE:CoreMessage[messageID=2151226563,durable=false,userID=null,priority=0, timestamp=0,expiration=0, durable=false, address=ActiveMQ.Advisory.TempQueue,size=1077,prop
erties=TypedProperties[__HDR_BROKER_IN_TIME=1542965785270,_AMQ_ROUTING_TYPE=0,__HDR_GROUP_SEQUENCE=0,__HDR_COMMAND_ID=0,__HDR_DATASTRUCTURE=[0000 0062 0800 0000 0000 0178 0100 2449 443A 616B 6F73 6172 6576 2D33 3933 ... 3535 2D62 6236 352D 3966 3165 6361 3033 3861 3766
0100 0000 0000 0000 0000),_AMQ_DUPL_ID=ID:akosarev-46097-1542964149858-1:1:0:0:21605,__HDR_MESSAGE_ID=[0000 004A 6E00 017B 0100 2349 443A 616B 6F73 6172 6576 2D34 3630 3937 2D31 ... 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 5465 0000 0000 0000 0000),__HDR_DROPPA
BLE=false,__HDR_ARRIVAL=0,_AMQ_ROUTE_TO$.artemis.internal.sf.sandbox-cluster.e0702bcf-e636-11e8-bca1-6aabda98944e=[0000 0000 0035 C0C3),bytesAsLongs(3522755],__HDR_PRODUCER_ID=[0000 0037 7B01 0023 4944 3A61 6B6F 7361 7265 762D 3436 3039 372D 3135 3432 3936 3431 3439 3835
382D 313A 3100 0000 0000 0000 0000 0000 0000 0000 00),JMSType=Advisory]]@1531727971: java.lang.IndexOutOfBoundsException: writerIndex: 4 (expected: readerIndex(0) <= writerIndex <= capacity(0))
 at io.netty.buffer.AbstractByteBuf.writerIndex(AbstractByteBuf.java:118) [netty-all-4.1.25.Final-redhat-00003.jar:4.1.25.Final-redhat-00003]
 at io.netty.buffer.WrappedByteBuf.writerIndex(WrappedByteBuf.java:129) [netty-all-4.1.25.Final-redhat-00003.jar:4.1.25.Final-redhat-00003]
 at org.apache.activemq.artemis.core.buffers.impl.ResetLimitWrappedActiveMQBuffer.writerIndex(ResetLimitWrappedActiveMQBuffer.java:128) [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.buffers.impl.ResetLimitWrappedActiveMQBuffer.<init>(ResetLimitWrappedActiveMQBuffer.java:60) [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.message.impl.CoreMessage.internalWritableBuffer(CoreMessage.java:367) [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.message.impl.CoreMessage.getBodyBuffer(CoreMessage.java:360) [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:241) [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:128) [artemis-core-client-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.deliverStandardMessage(BridgeImpl.java:743) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:619) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:2983) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2334) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2000(QueueImpl.java:107) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3209) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181]
 at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]

2018-11-23 14:36:25,280 WARN [org.apache.activemq.artemis.core.server.impl.QueueImpl] null: java.util.NoSuchElementException
 at org.apache.activemq.artemis.utils.collections.PriorityLinkedListImpl$PriorityLinkedListIterator.repeat(PriorityLinkedListImpl.java:172) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2353) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2000(QueueImpl.java:107) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3209) [artemis-server-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181]
 at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.6.3.redhat-00004.jar:2.6.3.redhat-00004]
{code}
Message redistribution works only in one direction after that. For example if we have 2 nodes in cluster: *nodeA* and *nodeB*, and this problem appeared in node *nodeA*, then message redistribution will transfer messages from *nodeB* to *nodeA*, but not in reverse direction, because messages will be stuck in queue *$.artemis.internal.sf.sandbox-cluster.CLUSTER_NAME.NODE_NAME* on *nodeA*.

Restarting a cluster node with stuck messages resumes message redistribution.

Seems like this issue depends on message sending frequency. If we produce 10 messages per second, then the issue appears earlier than in a minute after full restart of the cluster and clients.

 

Tested on:

OSs: CentOS 7, Ubuntu 18.04.1 LTS (both with libaio installed)

JDK: Oracle JDK 8, OpenJDK 8

> On-Demand Message Redistribution Can Spontaneously Start Failing in Single Direction
> ------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1864
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1864
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.5.0
>         Environment: RHEL 6.2
>            Reporter: Ilkka Virolainen
>            Priority: Major
>
> It's possible that the message redistribution of an Artemis cluster can spontaneously fail after running a while. I've witnessed this several times using a two node colocated replicating cluster with a basic configuration:
> {code:java}
> <cluster-connections>
>    <cluster-connection name="my-cluster">
>       <connector-ref>netty-connector</connector-ref>
>       <retry-interval>500</retry-interval>
>       <reconnect-attempts>5</reconnect-attempts>
>       <use-duplicate-detection>true</use-duplicate-detection>
>       <message-load-balancing>ON_DEMAND</message-load-balancing>
>       <max-hops>1</max-hops>
>       <discovery-group-ref discovery-group-name="my-discovery-group"/>
>    </cluster-connection>
> </cluster-connections>{code}
> After running a while (approx. two weeks) one of the nodes (node a) will stop consuming messages from the other node's (node b) internal store-and-forward queue. This will result in message redistribution not working from node b -> node a but will work from node a -> node b. The cause for this is unknown: nothing of note is logged for either broker and JMX shows that the cluster topology and the broker cluster bridge connection are intact. This will cause significant problems, mainly:
> 1. Client communication will only work as expected if the clients happen to connect to the right brokers
> 2. Unconsumed messages will end up piling in the internal store-and-forward queue and consume unnecessary resources. It's also possible (but not verified) that when messages in the internal queue expire, they leak memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)