You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by "Steigerwald, Aaron" <as...@brandesassociates.com.INVALID> on 2022/05/26 11:07:50 UTC
RE: [EXTERNAL]:Re: Cross data center HA cluster
Hello Iliya,
Thank you very much for you response, it's very helpful.
Regarding "message loss on send on network failure between data centers"- the example architecture I described does not have master/slave HA pairs in separate data centers. Do you think the message loss you described has anything to do with the master/slave pairs being clustered across data centers? I ask because the HA replication takes place between the master/slave pairs on a LAN.
Thank again,
Aaron Steigerwald
-----Original Message-----
From: Iliya Grushevskiy <il...@gmail.com>
Sent: Thursday, May 26, 2022 4:10 AM
To: users@activemq.apache.org
Subject: [EXTERNAL]:Re: Cross data center HA cluster
[CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________
Hi, Aaron
We are currently testing similar deployment and have encountered several issues:
- message lose on send on network failure between data centers
I think this is due to the fact that HA replication is asynchronous and replica server may not catch up with primary.
- message lose or duplicate (depending on error handling strategy) on consumer on network failure between data centers
I think this was caused by two factors: duplicate id cache is consistent only in HA pair and message redistribution was on.
Switching off redistribution (or as an option increasing delay) should fix this issue.
- message duplicate on mirrored server
This is addressed in pull request: https://github.com/apache/activemq-artemis/pull/4066
Regards
Iliya Grushevskiy
> 26 мая 2022 г., в 07:46, Justin Bertram <jb...@apache.org> написал(а):
>
> I'm not aware of such a production deployment and I would be surprised
> if there was one given that clustering was designed for local area
> networks with low latency which typically isn't what is found between data centers.
>
> I recommend you pursue your mirroring approach as that is what
> mirroring was designed for (i.e. cross data-center disaster-recovery use-cases).
>
>
> Justin
>
> On Wed, May 25, 2022 at 10:36 PM Steigerwald, Aaron
> <as...@brandesassociates.com.invalid> wrote:
>
>> Hello,
>>
>> Is anyone aware of a production deployment of an Artemis "cross data
>> center" HA cluster? For example, a cluster spread across 3 data centers.
>> Each data center contains a master/slave pair.
>>
>> I would like to know what kind of issues anyone has overcome with
>> such a configuration. I understand there are many configuration and
>> operational variables. Any info would be helpful.
>>
>> Note that we are considering asynchronously mirroring each
>> master/slave pair's queues to a dedicated asynchronous target node.
>> The asynchronous target node would exist in a different data center
>> and would not service any other connections. A custom plugin would
>> automatically scale down the messages into a live cluster node if the
>> connections to the master/slave mirror sources were disconnected for a period of time.
>>
>> Thank you,
>> Aaron Steigerwald
>>
Re: [EXTERNAL]:Re: Cross data center HA cluster
Posted by Iliya Grushevskiy <il...@gmail.com>.
Hi Aaron.
Sorry there were some misleading information in my previous message.
I have reviewed my test and it does contains network failure.
If I turn off network failure everything works as expected.
So as I understand if you consider your LAN a reliable network there should not be any message loss on send.
Regards
Iliya Grushevskiy
> 26 мая 2022 г., в 14:07, Steigerwald, Aaron <as...@brandesassociates.com.INVALID> написал(а):
>
> Hello Iliya,
>
> Thank you very much for you response, it's very helpful.
>
> Regarding "message loss on send on network failure between data centers"- the example architecture I described does not have master/slave HA pairs in separate data centers. Do you think the message loss you described has anything to do with the master/slave pairs being clustered across data centers? I ask because the HA replication takes place between the master/slave pairs on a LAN.
>
> Thank again,
>
> Aaron Steigerwald
>
> -----Original Message-----
> From: Iliya Grushevskiy <il...@gmail.com>
> Sent: Thursday, May 26, 2022 4:10 AM
> To: users@activemq.apache.org
> Subject: [EXTERNAL]:Re: Cross data center HA cluster
>
> [CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________
>
>
> Hi, Aaron
>
> We are currently testing similar deployment and have encountered several issues:
>
> - message lose on send on network failure between data centers
> I think this is due to the fact that HA replication is asynchronous and replica server may not catch up with primary.
>
> - message lose or duplicate (depending on error handling strategy) on consumer on network failure between data centers
> I think this was caused by two factors: duplicate id cache is consistent only in HA pair and message redistribution was on.
> Switching off redistribution (or as an option increasing delay) should fix this issue.
>
> - message duplicate on mirrored server
> This is addressed in pull request: https://github.com/apache/activemq-artemis/pull/4066
>
> Regards
> Iliya Grushevskiy
>
>
>> 26 мая 2022 г., в 07:46, Justin Bertram <jb...@apache.org> написал(а):
>>
>> I'm not aware of such a production deployment and I would be surprised
>> if there was one given that clustering was designed for local area
>> networks with low latency which typically isn't what is found between data centers.
>>
>> I recommend you pursue your mirroring approach as that is what
>> mirroring was designed for (i.e. cross data-center disaster-recovery use-cases).
>>
>>
>> Justin
>>
>> On Wed, May 25, 2022 at 10:36 PM Steigerwald, Aaron
>> <as...@brandesassociates.com.invalid> wrote:
>>
>>> Hello,
>>>
>>> Is anyone aware of a production deployment of an Artemis "cross data
>>> center" HA cluster? For example, a cluster spread across 3 data centers.
>>> Each data center contains a master/slave pair.
>>>
>>> I would like to know what kind of issues anyone has overcome with
>>> such a configuration. I understand there are many configuration and
>>> operational variables. Any info would be helpful.
>>>
>>> Note that we are considering asynchronously mirroring each
>>> master/slave pair's queues to a dedicated asynchronous target node.
>>> The asynchronous target node would exist in a different data center
>>> and would not service any other connections. A custom plugin would
>>> automatically scale down the messages into a live cluster node if the
>>> connections to the master/slave mirror sources were disconnected for a period of time.
>>>
>>> Thank you,
>>> Aaron Steigerwald
>>>
>
Re: [EXTERNAL]:Re: Cross data center HA cluster
Posted by Илья Грушевский <il...@gmail.com>.
I have a simple test with single HA pair in which I just kill master while sending messages in different thread. (it almost identical to replicated-transaction-failover example, except the different thread thing)
And I encounter the same message loss, as in network failure scenario.
I suspect there could be some miss configuration on client side in my test.
Example of messages flow in my test:
- send A1
- commit
- send A2 (will not be replicated and will be lost, replica can’t keep up with master)
- commit
- send A3
- commit (failed, failover to replica)
- resend A3
- commit (handle duplicate id)
I would expect synchronous replication in HA pair, but again I’m not sure that the client configuration is correct and my test is relevant.
Regards
Iliya Grushevskiy
> 26 мая 2022 г., в 14:07, Steigerwald, Aaron <as...@brandesassociates.com.INVALID> написал(а):
>
> Hello Iliya,
>
> Thank you very much for you response, it's very helpful.
>
> Regarding "message loss on send on network failure between data centers"- the example architecture I described does not have master/slave HA pairs in separate data centers. Do you think the message loss you described has anything to do with the master/slave pairs being clustered across data centers? I ask because the HA replication takes place between the master/slave pairs on a LAN.
>
> Thank again,
>
> Aaron Steigerwald
>
> -----Original Message-----
> From: Iliya Grushevskiy <il...@gmail.com>
> Sent: Thursday, May 26, 2022 4:10 AM
> To: users@activemq.apache.org
> Subject: [EXTERNAL]:Re: Cross data center HA cluster
>
> [CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________
>
>
> Hi, Aaron
>
> We are currently testing similar deployment and have encountered several issues:
>
> - message lose on send on network failure between data centers
> I think this is due to the fact that HA replication is asynchronous and replica server may not catch up with primary.
>
> - message lose or duplicate (depending on error handling strategy) on consumer on network failure between data centers
> I think this was caused by two factors: duplicate id cache is consistent only in HA pair and message redistribution was on.
> Switching off redistribution (or as an option increasing delay) should fix this issue.
>
> - message duplicate on mirrored server
> This is addressed in pull request: https://github.com/apache/activemq-artemis/pull/4066
>
> Regards
> Iliya Grushevskiy
>
>
>> 26 мая 2022 г., в 07:46, Justin Bertram <jb...@apache.org> написал(а):
>>
>> I'm not aware of such a production deployment and I would be surprised
>> if there was one given that clustering was designed for local area
>> networks with low latency which typically isn't what is found between data centers.
>>
>> I recommend you pursue your mirroring approach as that is what
>> mirroring was designed for (i.e. cross data-center disaster-recovery use-cases).
>>
>>
>> Justin
>>
>> On Wed, May 25, 2022 at 10:36 PM Steigerwald, Aaron
>> <as...@brandesassociates.com.invalid> wrote:
>>
>>> Hello,
>>>
>>> Is anyone aware of a production deployment of an Artemis "cross data
>>> center" HA cluster? For example, a cluster spread across 3 data centers.
>>> Each data center contains a master/slave pair.
>>>
>>> I would like to know what kind of issues anyone has overcome with
>>> such a configuration. I understand there are many configuration and
>>> operational variables. Any info would be helpful.
>>>
>>> Note that we are considering asynchronously mirroring each
>>> master/slave pair's queues to a dedicated asynchronous target node.
>>> The asynchronous target node would exist in a different data center
>>> and would not service any other connections. A custom plugin would
>>> automatically scale down the messages into a live cluster node if the
>>> connections to the master/slave mirror sources were disconnected for a period of time.
>>>
>>> Thank you,
>>> Aaron Steigerwald
>>>
>