You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Ilkka Virolainen <Il...@bitwise.fi> on 2018/03/09 12:21:11 UTC

Artemis 2.5.0 - Problems with colocated scaledown

Hello,

I have some issues with scaledown of colocated servers. I have a symmetric statically defined cluster of two colocated nodes configured with scale down. The situation occurs thus:

1. Start both brokers. They form a connection and replicate.

2. Close server1
-> Server shuts down, server0 detects the shutdown and scales down from replicated backup.

3. Start server1
-->
Server0 logs:
2018-03-09 10:57:57,434 WARN  [org.apache.activemq.artemis.core.server] AMQ222138: Local Member is not set at on ClusterConnection ClusterConnectionImpl@914942811[nodeUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011, connector=TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=localhost&activemq-passwordcodec=****, address=, server=ActiveMQServerImpl::serverUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011]

Server1 logs in an infinite loop:

2018-03-09 11:00:57,162 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:02,156 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:07,154 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:12,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:17,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:22,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:27,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:32,149 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
...

The situation only normalizes when server1 is shut down and restarted.

Broker configurations for replicating: https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

I also have a separate issue that I've so far been unable to replicate locally. When the brokers deployed on two different physical servers, after one node shuts down, the other stops accepting connections. Clients attempting connections log : org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119013: Timed out waiting to receive cluster topology. Group:null

I don't really understand why this is happening or why it doesn't happen locally. The cluster topology should be known already for everyone involved. I understand that it's difficult to comment on this as there's no means of replicating this but maybe it's a situation someone has come across before?

Best regards,
- Ilkka


RE: Artemis 2.5.0 - Problems with colocated scaledown

Posted by Ilkka Virolainen <Il...@bitwise.fi>.
It looks like the issues were related to Artemis somehow not always having a complete cluster topology after a sequence of shutdown/scaledown and failback. I changed the cluster connections to use udp discovery/broadcast groups instead of static tcp connectors. This seems to have been a workaround for the underlying issue.

- Ilkka

-----Original Message-----
From: Ilkka Virolainen <Il...@bitwise.fi> 
Sent: 14. maaliskuuta 2018 11:08
To: users@activemq.apache.org
Subject: RE: Artemis 2.5.0 - Problems with colocated scaledown

Excluding tcp-connectors and leaving invm-connectors to the ha-policy I'm seeing the following behavior after server0 has been shutdown and restarted:

Server0 logs in an infinite loop:

...
2018-03-14 11:04:56,976 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-14 11:04:56,981 INFO  [org.apache.activemq.artemis.core.server] AMQ221060: Sending quorum vote request to localhost/127.0.0.1:61618: RequestBackupVote [backupsSize=-1, nodeID=null, backupAvailable=false]
2018-03-14 11:04:56,983 INFO  [org.apache.activemq.artemis.core.server] AMQ221061: Received quorum vote response from localhost/127.0.0.1:61618: RequestBackupVote [backupsSize=1, nodeID=82925fbd-275e-11e8-bff4-0a0027000011, backupAvailable=false] ...

Server1 logs in an infinite loop:

...
2018-03-14 11:04:51,982 INFO  [org.apache.activemq.artemis.core.server] AMQ221062: Received quorum vote request: RequestBackupVote [backupsSize=-1, nodeID=null, backupAvailable=false]
2018-03-14 11:04:51,983 INFO  [org.apache.activemq.artemis.core.server] AMQ221063: Sending quorum vote response: RequestBackupVote [backupsSize=1, nodeID=82925fbd-275e-11e8-bff4-0a0027000011, backupAvailable=false] ...

Why is there an endless unsuccessful backup voting taking place with backupsize -1 and null nodeid?

Best regards,
- Ilkka

-----Original Message-----
From: Ilkka Virolainen [mailto:Ilkka.Virolainen@bitwise.fi]
Sent: 13. maaliskuuta 2018 16:46
To: users@activemq.apache.org
Subject: RE: Artemis 2.5.0 - Problems with colocated scaledown

A part of my problem was on the client side but the scaledown issue is still unresolved. It would seem that client connectivity issues are related to the scaledown issues: to replicate the client connectivity problem: start both brokers, then connect with 1.5.4 client using tcp://localhost:61616 and send a message to a topic. Now shutdown server0. It scales down to server1. Trying to send a message from the client now fails even though a failover should've occurred. Restarting server0 results in the infinite vote for backup quorum.

Could I get clarification on whether the fault is with the broker configurations (ref. [1]) or is this an issue with Artemis? I'm aiming for a symmetrical statically defined cluster of two nodes, each storing a backup of each other's data and when one is shut down, the data should be made available for the remaining live broker and clients should failover to it. When the other broker is brought back online, the replication should continue normally. 

Documentation and examples give the impression that in-vm connectors/acceptors are needed for the scaledown and synchronization between a slave storing the backup and the colocated live master that the backup would be scaled down to. In any case, so far I've been unable to resolve these issues I've been having by trying out different HA options.

Best regards,
- Ilkka

[1] Reference broker configuration https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

-----Original Message-----
From: Ilkka Virolainen [mailto:Ilkka.Virolainen@bitwise.fi]
Sent: 9. maaliskuuta 2018 14:21
To: users@activemq.apache.org
Subject: Artemis 2.5.0 - Problems with colocated scaledown

Hello,

I have some issues with scaledown of colocated servers. I have a symmetric statically defined cluster of two colocated nodes configured with scale down. The situation occurs thus:

1. Start both brokers. They form a connection and replicate.

2. Close server1
-> Server shuts down, server0 detects the shutdown and scales down from replicated backup.

3. Start server1
-->
Server0 logs:
2018-03-09 10:57:57,434 WARN  [org.apache.activemq.artemis.core.server] AMQ222138: Local Member is not set at on ClusterConnection ClusterConnectionImpl@914942811[nodeUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011, connector=TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=localhost&activemq-passwordcodec=****, address=, server=ActiveMQServerImpl::serverUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011]

Server1 logs in an infinite loop:

2018-03-09 11:00:57,162 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:02,156 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:07,154 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:12,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:17,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:22,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:27,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:32,149 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote ...

The situation only normalizes when server1 is shut down and restarted.

Broker configurations for replicating: https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

I also have a separate issue that I've so far been unable to replicate locally. When the brokers deployed on two different physical servers, after one node shuts down, the other stops accepting connections. Clients attempting connections log : org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119013: Timed out waiting to receive cluster topology. Group:null

I don't really understand why this is happening or why it doesn't happen locally. The cluster topology should be known already for everyone involved. I understand that it's difficult to comment on this as there's no means of replicating this but maybe it's a situation someone has come across before?

Best regards,
- Ilkka


RE: Artemis 2.5.0 - Problems with colocated scaledown

Posted by Ilkka Virolainen <Il...@bitwise.fi>.
Excluding tcp-connectors and leaving invm-connectors to the ha-policy I'm seeing the following behavior after server0 has been shutdown and restarted:

Server0 logs in an infinite loop:

...
2018-03-14 11:04:56,976 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-14 11:04:56,981 INFO  [org.apache.activemq.artemis.core.server] AMQ221060: Sending quorum vote request to localhost/127.0.0.1:61618: RequestBackupVote [backupsSize=-1, nodeID=null, backupAvailable=false]
2018-03-14 11:04:56,983 INFO  [org.apache.activemq.artemis.core.server] AMQ221061: Received quorum vote response from localhost/127.0.0.1:61618: RequestBackupVote [backupsSize=1, nodeID=82925fbd-275e-11e8-bff4-0a0027000011, backupAvailable=false]
...

Server1 logs in an infinite loop:

...
2018-03-14 11:04:51,982 INFO  [org.apache.activemq.artemis.core.server] AMQ221062: Received quorum vote request: RequestBackupVote [backupsSize=-1, nodeID=null, backupAvailable=false]
2018-03-14 11:04:51,983 INFO  [org.apache.activemq.artemis.core.server] AMQ221063: Sending quorum vote response: RequestBackupVote [backupsSize=1, nodeID=82925fbd-275e-11e8-bff4-0a0027000011, backupAvailable=false]
...

Why is there an endless unsuccessful backup voting taking place with backupsize -1 and null nodeid?

Best regards,
- Ilkka

-----Original Message-----
From: Ilkka Virolainen [mailto:Ilkka.Virolainen@bitwise.fi] 
Sent: 13. maaliskuuta 2018 16:46
To: users@activemq.apache.org
Subject: RE: Artemis 2.5.0 - Problems with colocated scaledown

A part of my problem was on the client side but the scaledown issue is still unresolved. It would seem that client connectivity issues are related to the scaledown issues: to replicate the client connectivity problem: start both brokers, then connect with 1.5.4 client using tcp://localhost:61616 and send a message to a topic. Now shutdown server0. It scales down to server1. Trying to send a message from the client now fails even though a failover should've occurred. Restarting server0 results in the infinite vote for backup quorum.

Could I get clarification on whether the fault is with the broker configurations (ref. [1]) or is this an issue with Artemis? I'm aiming for a symmetrical statically defined cluster of two nodes, each storing a backup of each other's data and when one is shut down, the data should be made available for the remaining live broker and clients should failover to it. When the other broker is brought back online, the replication should continue normally. 

Documentation and examples give the impression that in-vm connectors/acceptors are needed for the scaledown and synchronization between a slave storing the backup and the colocated live master that the backup would be scaled down to. In any case, so far I've been unable to resolve these issues I've been having by trying out different HA options.

Best regards,
- Ilkka

[1] Reference broker configuration https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

-----Original Message-----
From: Ilkka Virolainen [mailto:Ilkka.Virolainen@bitwise.fi] 
Sent: 9. maaliskuuta 2018 14:21
To: users@activemq.apache.org
Subject: Artemis 2.5.0 - Problems with colocated scaledown

Hello,

I have some issues with scaledown of colocated servers. I have a symmetric statically defined cluster of two colocated nodes configured with scale down. The situation occurs thus:

1. Start both brokers. They form a connection and replicate.

2. Close server1
-> Server shuts down, server0 detects the shutdown and scales down from replicated backup.

3. Start server1
-->
Server0 logs:
2018-03-09 10:57:57,434 WARN  [org.apache.activemq.artemis.core.server] AMQ222138: Local Member is not set at on ClusterConnection ClusterConnectionImpl@914942811[nodeUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011, connector=TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=localhost&activemq-passwordcodec=****, address=, server=ActiveMQServerImpl::serverUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011]

Server1 logs in an infinite loop:

2018-03-09 11:00:57,162 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:02,156 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:07,154 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:12,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:17,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:22,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:27,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:32,149 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote ...

The situation only normalizes when server1 is shut down and restarted.

Broker configurations for replicating: https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

I also have a separate issue that I've so far been unable to replicate locally. When the brokers deployed on two different physical servers, after one node shuts down, the other stops accepting connections. Clients attempting connections log : org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119013: Timed out waiting to receive cluster topology. Group:null

I don't really understand why this is happening or why it doesn't happen locally. The cluster topology should be known already for everyone involved. I understand that it's difficult to comment on this as there's no means of replicating this but maybe it's a situation someone has come across before?

Best regards,
- Ilkka


RE: Artemis 2.5.0 - Problems with colocated scaledown

Posted by Ilkka Virolainen <Il...@bitwise.fi>.
A part of my problem was on the client side but the scaledown issue is still unresolved. It would seem that client connectivity issues are related to the scaledown issues: to replicate the client connectivity problem: start both brokers, then connect with 1.5.4 client using tcp://localhost:61616 and send a message to a topic. Now shutdown server0. It scales down to server1. Trying to send a message from the client now fails even though a failover should've occurred. Restarting server0 results in the infinite vote for backup quorum.

Could I get clarification on whether the fault is with the broker configurations (ref. [1]) or is this an issue with Artemis? I'm aiming for a symmetrical statically defined cluster of two nodes, each storing a backup of each other's data and when one is shut down, the data should be made available for the remaining live broker and clients should failover to it. When the other broker is brought back online, the replication should continue normally. 

Documentation and examples give the impression that in-vm connectors/acceptors are needed for the scaledown and synchronization between a slave storing the backup and the colocated live master that the backup would be scaled down to. In any case, so far I've been unable to resolve these issues I've been having by trying out different HA options.

Best regards,
- Ilkka

[1] Reference broker configuration https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

-----Original Message-----
From: Ilkka Virolainen [mailto:Ilkka.Virolainen@bitwise.fi] 
Sent: 9. maaliskuuta 2018 14:21
To: users@activemq.apache.org
Subject: Artemis 2.5.0 - Problems with colocated scaledown

Hello,

I have some issues with scaledown of colocated servers. I have a symmetric statically defined cluster of two colocated nodes configured with scale down. The situation occurs thus:

1. Start both brokers. They form a connection and replicate.

2. Close server1
-> Server shuts down, server0 detects the shutdown and scales down from replicated backup.

3. Start server1
-->
Server0 logs:
2018-03-09 10:57:57,434 WARN  [org.apache.activemq.artemis.core.server] AMQ222138: Local Member is not set at on ClusterConnection ClusterConnectionImpl@914942811[nodeUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011, connector=TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=localhost&activemq-passwordcodec=****, address=, server=ActiveMQServerImpl::serverUUID=1ed6bd4b-2377-11e8-a9e2-0a0027000011]

Server1 logs in an infinite loop:

2018-03-09 11:00:57,162 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:02,156 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:07,154 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:12,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:17,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:22,153 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:27,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote
2018-03-09 11:01:32,149 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: RequestBackupQuorumVote ...

The situation only normalizes when server1 is shut down and restarted.

Broker configurations for replicating: https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq

I also have a separate issue that I've so far been unable to replicate locally. When the brokers deployed on two different physical servers, after one node shuts down, the other stops accepting connections. Clients attempting connections log : org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119013: Timed out waiting to receive cluster topology. Group:null

I don't really understand why this is happening or why it doesn't happen locally. The cluster topology should be known already for everyone involved. I understand that it's difficult to comment on this as there's no means of replicating this but maybe it's a situation someone has come across before?

Best regards,
- Ilkka