You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Bob Maloney (Jira)" <ji...@apache.org> on 2022/08/23 17:24:00 UTC
[jira] [Commented] (ARTEMIS-3831) Scale-down fails when using same discovery-group used by Broker cluster connection

    [ https://issues.apache.org/jira/browse/ARTEMIS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583767#comment-17583767 ] 

Bob Maloney commented on ARTEMIS-3831:
--------------------------------------

Receiving the same error. Are there example config files for the possible workaround in the description? Aside from scale-down, I have clustering operational in Kubernetes.

Note that the error can be replicated with a single cluster-enabled broker. For the workaround, I've essentially duplicated the existing configs, but nothing stands out that now one JGroups channel will be used by the broker versus the other used by scale-down. No errors on startup, but still receiving AMQ222181 on shutdown.

acceptor/connector (for separate port)
{code:xml}
         ...         
         <acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>

         <!-- added -->
         <acceptor name="jgroups-netty-acceptor">tcp://0.0.0.0:61619</acceptor>

      </acceptors>

      <connectors>
         <connector name="netty-connector">tcp://0.0.0.0:61618</connector>
         <!-- added -->
         <connector name="jgroups-netty-connector">tcp://0.0.0.0:61619</connector>
      </connectors>
{code}
broadcast-group
{code:xml}
      <broadcast-groups>
         <broadcast-group name="artemis-broadcast-group">
            <broadcast-period>2000</broadcast-period>
            <jgroups-file>jgroups.xml</jgroups-file>
            <jgroups-channel>artemis_broadcast_channel</jgroups-channel>
            <connector-ref>netty-connector</connector-ref>
         </broadcast-group>
         <!-- added below -->
         <broadcast-group name="jgroups-broadcast-group">
            <broadcast-period>2000</broadcast-period>
            <jgroups-file>jgroups_2.xml</jgroups-file>
            <jgroups-channel>jgroups_broadcast_channel</jgroups-channel>
            <connector-ref>jgroups-netty-connector</connector-ref>
         </broadcast-group>
      </broadcast-groups>
{code}
discovery-group
{code:xml}
      <discovery-groups>
         <discovery-group name="artemis-discovery-group">
            <jgroups-file>jgroups.xml</jgroups-file>
            <jgroups-channel>artemis_broadcast_channel</jgroups-channel>
            <refresh-timeout>10000</refresh-timeout>
         </discovery-group>
         <!-- added below -->
         <discovery-group name="jgroups-discovery-group">
            <jgroups-file>jgroups_2.xml</jgroups-file>
            <jgroups-channel>jgroups_broadcast_channel</jgroups-channel>
            <refresh-timeout>10000</refresh-timeout>
         </discovery-group>
      </discovery-groups> 
{code}
cluster-connection
{code:xml}
      <cluster-connections>
         <cluster-connection name="artemis-cluster">
            <address></address>
            <connector-ref>netty-connector</connector-ref>
            <check-period>1000</check-period>
            <connection-ttl>5000</connection-ttl>
            <min-large-message-size>50000</min-large-message-size>
            <call-timeout>5000</call-timeout>
            <retry-interval>500</retry-interval>
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            <max-retry-interval>5000</max-retry-interval>
            <initial-connect-attempts>-1</initial-connect-attempts>
            <reconnect-attempts>-1</reconnect-attempts>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <confirmation-window-size>32000</confirmation-window-size>
            <call-failover-timeout>30000</call-failover-timeout>
            <notification-interval>1000</notification-interval>
            <notification-attempts>2</notification-attempts>
            <discovery-group-ref discovery-group-name="artemis-discovery-group"/>
         </cluster-connection>
         <!-- added below -->
         <cluster-connection name="jgroups-cluster">
            <address></address>
            <connector-ref>jgroups-netty-connector</connector-ref>
            <check-period>1000</check-period>
            <connection-ttl>5000</connection-ttl>
            <min-large-message-size>50000</min-large-message-size>
            <call-timeout>5000</call-timeout>
            <retry-interval>500</retry-interval>
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            <max-retry-interval>5000</max-retry-interval>
            <initial-connect-attempts>-1</initial-connect-attempts>
            <reconnect-attempts>-1</reconnect-attempts>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <confirmation-window-size>32000</confirmation-window-size>
            <call-failover-timeout>30000</call-failover-timeout>
            <notification-interval>1000</notification-interval>
            <notification-attempts>2</notification-attempts>
            <discovery-group-ref discovery-group-name="jgroups-discovery-group"/>
         </cluster-connection>
      </cluster-connections>
{code}
New <jgroups-file>, with only change being a separate bind_port. jgroups-kubernetes is used for server discovery
{code:xml}
<config xmlns="urn:org:jgroups"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
    <TCP external_addr="match-interface:eth0" bind_addr="site_local,match-interface:eth0" bind_port="7801" recv_buf_size="5M" send_buf_size="1M" thread_naming_pattern="cl" thread_pool.min_threads="0" thread_pool.max_threads="500" thread_pool.keep_alive_time="30000"/>

    <org.jgroups.protocols.kubernetes.KUBE_PING namespace="..." labels="app.kubernetes.io/instance=..." useNotReadyAddresses="false"/>

    <PING return_entire_cache="true"/>

    <MERGE3 max_interval="30000" min_interval="10000"/>
    <FD_SOCK2/>
    <FD_ALL timeout="10000" interval="3000"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2 xmit_interval="500" xmit_table_num_rows="100" xmit_table_msgs_per_row="2000" xmit_table_max_compaction_time="30000" use_mcast_xmit="false" discard_delivered_msgs="true"/>
    <UNICAST3 xmit_table_num_rows="100" xmit_table_msgs_per_row="1000" xmit_table_max_compaction_time="30000"/>
    <pbcast.STABLE desired_avg_gossip="50000" max_bytes="8m"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"/>
    <MFC max_credits="2M" min_threshold="0.4"/>
    <FRAG2 frag_size="60K"/>
</config>
{code}
 

> Scale-down fails when using same discovery-group used by Broker cluster connection
> ----------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-3831
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3831
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.19.1
>            Reporter: Apache Dev
>            Priority: Major
>
> Using 2 Live brokers in cluster.
> Both having the following HA Policy:
> {code}
>         <ha-policy>
>             <live-only>
>                 <scale-down>
>                     <enabled>true</enabled>
>                     <discovery-group-ref discovery-group-name="activemq-discovery-group"/>
>                 </scale-down>
>             </live-only>
>         </ha-policy>
> {code}
> where "activemq-discovery-group" is using JGroups TCPPING:
> {code}
>         <discovery-groups>
>             <discovery-group name="activemq-discovery-group">
>                 <jgroups-file>...</jgroups-file>
>                 <jgroups-channel>...</jgroups-channel>
>                 <refresh-timeout>10000</refresh-timeout>
>             </discovery-group>
>         </discovery-groups>
> {code}
> and it is used by the cluster of 2 brokers:
> {code}
>         <cluster-connections>
>             <cluster-connection name="activemq-cluster">
>                 <connector-ref>netty-connector</connector-ref>
>                 <retry-interval>5000</retry-interval>
>                 <use-duplicate-detection>true</use-duplicate-detection>
>                 <message-load-balancing>OFF</message-load-balancing>
>                 <max-hops>1</max-hops>
>                 <discovery-group-ref discovery-group-name="activemq-discovery-group"/>
>             </cluster-connection>
>         </cluster-connections>
> {code}
> Issue is that when shutdown happens, scale-down fails:
> {code}
> org.apache.activemq.artemis.core.server                      W AMQ222181: Unable to scaleDown messages
>         ActiveMQInternalErrorException[errorType=INTERNAL_ERROR message=AMQ219004: Failed to initialise session factory]
>         at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.initialize(ServerLocatorImpl.java:272)
>         at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:655)
>         at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:554)
>         at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:533)
>         at org.apache.activemq.artemis.core.server.LiveNodeLocator.connectToCluster(LiveNodeLocator.java:85)
>         at org.apache.activemq.artemis.core.server.impl.LiveOnlyActivation.connectToScaleDownTarget(LiveOnlyActivation.java:146)
>         at org.apache.activemq.artemis.core.server.impl.LiveOnlyActivation.freezeConnections(LiveOnlyActivation.java:114)
>         at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.freezeConnections(ActiveMQServerImpl.java:1468)
>         at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1250)
>         at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1166)
>         at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1150)
>         ...
>         Caused by: ActiveMQInternalErrorException[errorType=INTERNAL_ERROR message=channel is closed]
>         at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.startDiscovery(ServerLocatorImpl.java:286)
>         at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.initialize(ServerLocatorImpl.java:268)
>         ... 44 more
>         Caused by: java.lang.IllegalStateException: channel is closed
>         at org.jgroups.JChannel.checkClosed(JChannel.java:957)
>         at org.jgroups.JChannel._preConnect(JChannel.java:548)
>         at org.jgroups.JChannel.connect(JChannel.java:288)
>         at org.jgroups.JChannel.connect(JChannel.java:279)
>         at org.apache.activemq.artemis.api.core.jgroups.JChannelWrapper.connect(JChannelWrapper.java:126)
>         at org.apache.activemq.artemis.api.core.JGroupsBroadcastEndpoint.internalOpen(JGroupsBroadcastEndpoint.java:113)
>         at org.apache.activemq.artemis.api.core.JGroupsBroadcastEndpoint.openClient(JGroupsBroadcastEndpoint.java:91)
>         at org.apache.activemq.artemis.core.cluster.DiscoveryGroup.start(DiscoveryGroup.java:111)
>         at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.startDiscovery(ServerLocatorImpl.java:284)
>         ... 45 more
> {code}
> JGroups channel used by scale-down is probably the same used by broker, but already being closed during broker shutdown itself.
> As a workaround, it is possible to create a separate discovery-group (with its own broadcast-group) so that scale-down uses a new JGroups channel not being closed by broker.
> However, this causes duplication of configurations and a new JGroups port for the scale-down discovery must be opened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)