You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by "Steigerwald, Aaron" <as...@brandesassociates.com.INVALID> on 2022/05/26 11:07:50 UTC

RE: [EXTERNAL]:Re: Cross data center HA cluster

Hello Iliya,

Thank you very much for you response, it's very helpful.

Regarding "message loss on send on network failure between data centers"- the example architecture I described does not have master/slave HA pairs in separate data centers. Do you think the message loss you described has anything to do with the master/slave pairs being clustered across data centers? I ask because the HA replication takes place between the master/slave pairs on a LAN.

Thank again,

Aaron Steigerwald

-----Original Message-----
From: Iliya Grushevskiy <il...@gmail.com> 
Sent: Thursday, May 26, 2022 4:10 AM
To: users@activemq.apache.org
Subject: [EXTERNAL]:Re: Cross data center HA cluster

[CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________


Hi, Aaron

We are currently testing similar deployment and have encountered several issues:

- message lose on send on network failure between data centers
  I think this is due to the fact that HA replication is asynchronous and replica server may not catch up with primary.

- message lose or duplicate (depending on error handling strategy) on consumer on network failure between data centers
  I think this was caused by two factors: duplicate id cache is consistent only in HA pair and message redistribution was on.
  Switching off redistribution (or as an option increasing delay) should fix this issue.

- message duplicate on mirrored server
  This is addressed in pull request: https://github.com/apache/activemq-artemis/pull/4066

Regards
Iliya Grushevskiy


> 26 мая 2022 г., в 07:46, Justin Bertram <jb...@apache.org> написал(а):
>
> I'm not aware of such a production deployment and I would be surprised 
> if there was one given that clustering was designed for local area 
> networks with low latency which typically isn't what is found between data centers.
>
> I recommend you pursue your mirroring approach as that is what 
> mirroring was designed for (i.e. cross data-center disaster-recovery use-cases).
>
>
> Justin
>
> On Wed, May 25, 2022 at 10:36 PM Steigerwald, Aaron 
> <as...@brandesassociates.com.invalid> wrote:
>
>> Hello,
>>
>> Is anyone aware of a production deployment of an Artemis "cross data 
>> center" HA cluster? For example, a cluster spread across 3 data centers.
>> Each data center contains a master/slave pair.
>>
>> I would like to know what kind of issues anyone has overcome with 
>> such a configuration. I understand there are many configuration and 
>> operational variables. Any info would be helpful.
>>
>> Note that we are considering asynchronously mirroring each 
>> master/slave pair's queues to a dedicated asynchronous target node. 
>> The asynchronous target node would exist in a different data center 
>> and would not service any other connections. A custom plugin would 
>> automatically scale down the messages into a live cluster node if the 
>> connections to the master/slave mirror sources were disconnected for a period of time.
>>
>> Thank you,
>> Aaron Steigerwald
>>

Re: [EXTERNAL]:Re: Cross data center HA cluster

Posted by Iliya Grushevskiy <il...@gmail.com>.

Hi Aaron.

Sorry there were some misleading information in my previous message. 
I have reviewed my test and it does contains network failure. 
If I turn off network failure everything works as expected.

So as I understand if you consider your LAN a reliable network there should not be any message loss on send.

Regards
Iliya Grushevskiy




> 26 мая 2022 г., в 14:07, Steigerwald, Aaron <as...@brandesassociates.com.INVALID> написал(а):
> 
> Hello Iliya,
> 
> Thank you very much for you response, it's very helpful.
> 
> Regarding "message loss on send on network failure between data centers"- the example architecture I described does not have master/slave HA pairs in separate data centers. Do you think the message loss you described has anything to do with the master/slave pairs being clustered across data centers? I ask because the HA replication takes place between the master/slave pairs on a LAN.
> 
> Thank again,
> 
> Aaron Steigerwald
> 
> -----Original Message-----
> From: Iliya Grushevskiy <il...@gmail.com> 
> Sent: Thursday, May 26, 2022 4:10 AM
> To: users@activemq.apache.org
> Subject: [EXTERNAL]:Re: Cross data center HA cluster
> 
> [CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________
> 
> 
> Hi, Aaron
> 
> We are currently testing similar deployment and have encountered several issues:
> 
> - message lose on send on network failure between data centers
>  I think this is due to the fact that HA replication is asynchronous and replica server may not catch up with primary.
> 
> - message lose or duplicate (depending on error handling strategy) on consumer on network failure between data centers
>  I think this was caused by two factors: duplicate id cache is consistent only in HA pair and message redistribution was on.
>  Switching off redistribution (or as an option increasing delay) should fix this issue.
> 
> - message duplicate on mirrored server
>  This is addressed in pull request: https://github.com/apache/activemq-artemis/pull/4066
> 
> Regards
> Iliya Grushevskiy
> 
> 
>> 26 мая 2022 г., в 07:46, Justin Bertram <jb...@apache.org> написал(а):
>> 
>> I'm not aware of such a production deployment and I would be surprised 
>> if there was one given that clustering was designed for local area 
>> networks with low latency which typically isn't what is found between data centers.
>> 
>> I recommend you pursue your mirroring approach as that is what 
>> mirroring was designed for (i.e. cross data-center disaster-recovery use-cases).
>> 
>> 
>> Justin
>> 
>> On Wed, May 25, 2022 at 10:36 PM Steigerwald, Aaron 
>> <as...@brandesassociates.com.invalid> wrote:
>> 
>>> Hello,
>>> 
>>> Is anyone aware of a production deployment of an Artemis "cross data 
>>> center" HA cluster? For example, a cluster spread across 3 data centers.
>>> Each data center contains a master/slave pair.
>>> 
>>> I would like to know what kind of issues anyone has overcome with 
>>> such a configuration. I understand there are many configuration and 
>>> operational variables. Any info would be helpful.
>>> 
>>> Note that we are considering asynchronously mirroring each 
>>> master/slave pair's queues to a dedicated asynchronous target node. 
>>> The asynchronous target node would exist in a different data center 
>>> and would not service any other connections. A custom plugin would 
>>> automatically scale down the messages into a live cluster node if the 
>>> connections to the master/slave mirror sources were disconnected for a period of time.
>>> 
>>> Thank you,
>>> Aaron Steigerwald
>>> 
>

Re: [EXTERNAL]:Re: Cross data center HA cluster

Posted by Илья Грушевский <il...@gmail.com>.

I have a simple test with single HA pair in which I just kill master while sending messages in different thread. (it almost identical to replicated-transaction-failover example, except the different thread thing)
And I encounter the same message loss, as in network failure scenario.
I suspect there could be some miss configuration on client side in my test.

Example of messages flow in my test:

- send A1
- commit
- send A2 (will not be replicated and will be lost, replica can’t keep up with master)
- commit
- send A3 
- commit (failed, failover to replica)
- resend A3 
- commit (handle duplicate id)

I would expect synchronous replication in HA pair, but again I’m not sure that the client configuration is correct and my test is relevant.

Regards
Iliya Grushevskiy 

> 26 мая 2022 г., в 14:07, Steigerwald, Aaron <as...@brandesassociates.com.INVALID> написал(а):
> 
> Hello Iliya,
> 
> Thank you very much for you response, it's very helpful.
> 
> Regarding "message loss on send on network failure between data centers"- the example architecture I described does not have master/slave HA pairs in separate data centers. Do you think the message loss you described has anything to do with the master/slave pairs being clustered across data centers? I ask because the HA replication takes place between the master/slave pairs on a LAN.
> 
> Thank again,
> 
> Aaron Steigerwald
> 
> -----Original Message-----
> From: Iliya Grushevskiy <il...@gmail.com> 
> Sent: Thursday, May 26, 2022 4:10 AM
> To: users@activemq.apache.org
> Subject: [EXTERNAL]:Re: Cross data center HA cluster
> 
> [CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________
> 
> 
> Hi, Aaron
> 
> We are currently testing similar deployment and have encountered several issues:
> 
> - message lose on send on network failure between data centers
>  I think this is due to the fact that HA replication is asynchronous and replica server may not catch up with primary.
> 
> - message lose or duplicate (depending on error handling strategy) on consumer on network failure between data centers
>  I think this was caused by two factors: duplicate id cache is consistent only in HA pair and message redistribution was on.
>  Switching off redistribution (or as an option increasing delay) should fix this issue.
> 
> - message duplicate on mirrored server
>  This is addressed in pull request: https://github.com/apache/activemq-artemis/pull/4066
> 
> Regards
> Iliya Grushevskiy
> 
> 
>> 26 мая 2022 г., в 07:46, Justin Bertram <jb...@apache.org> написал(а):
>> 
>> I'm not aware of such a production deployment and I would be surprised 
>> if there was one given that clustering was designed for local area 
>> networks with low latency which typically isn't what is found between data centers.
>> 
>> I recommend you pursue your mirroring approach as that is what 
>> mirroring was designed for (i.e. cross data-center disaster-recovery use-cases).
>> 
>> 
>> Justin
>> 
>> On Wed, May 25, 2022 at 10:36 PM Steigerwald, Aaron 
>> <as...@brandesassociates.com.invalid> wrote:
>> 
>>> Hello,
>>> 
>>> Is anyone aware of a production deployment of an Artemis "cross data 
>>> center" HA cluster? For example, a cluster spread across 3 data centers.
>>> Each data center contains a master/slave pair.
>>> 
>>> I would like to know what kind of issues anyone has overcome with 
>>> such a configuration. I understand there are many configuration and 
>>> operational variables. Any info would be helpful.
>>> 
>>> Note that we are considering asynchronously mirroring each 
>>> master/slave pair's queues to a dedicated asynchronous target node. 
>>> The asynchronous target node would exist in a different data center 
>>> and would not service any other connections. A custom plugin would 
>>> automatically scale down the messages into a live cluster node if the 
>>> connections to the master/slave mirror sources were disconnected for a period of time.
>>> 
>>> Thank you,
>>> Aaron Steigerwald
>>> 
>