You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by "Steigerwald, Aaron" <as...@brandesassociates.com.INVALID> on 2022/05/26 03:08:01 UTC

RE: [EXTERNAL]:Re: Journal corruption caused by split brain?

Hello again Justin,

Are there any plans to drop the traditional Artemis quorum voting mechanism for the pluggable quorum provider configuration? Any idea when it will become officially stable?

Thank you,
Aaron Steigerwald

-----Original Message-----
From: Steigerwald, Aaron 
Sent: Friday, April 8, 2022 11:46 AM
To: users@activemq.apache.org
Subject: RE: [EXTERNAL]:Re: Journal corruption caused by split brain?

Hello Justin,

Thank you for the thorough explanation.

How much better is ZooKeeper at mitigating split brain than having a 3+ node cluster? Why would ZooKeeper be better than quorum voting at dealing with network issues that would cause split brain?

Thanks again,
Aaron
 
-----Original Message-----
From: Justin Bertram <jb...@apache.org>
Sent: Thursday, April 7, 2022 11:18 PM
To: users@activemq.apache.org
Subject: [EXTERNAL]:Re: Journal corruption caused by split brain?

[CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________


It's not clear exactly what is meant here by "corruption," but I think it's probably the wrong word to describe the issues caused by split-brain. The brokers should be able to read all the data in the journals no problem. The actual problems are more related to potential duplicate consumption or missed messages.

Regarding duplicate consumption, consider 2 JMS consumers listening on a queue on the master broker. Those consumers would be competing for the same messages such that each message would only be consumed once (i.e. by either consumer). However, once split brain occurs you could potentially have a consumer on the same queue on *each* broker. In that case, they could each receive the same message since they were no longer competing with each other.

Regarding missed messages, consider a non-durable JMS topic subscriber.
While it's connected to the master it receives every message sent to the topic. However, once split brain occurs the producer might send messages to the broker where the subscriber isn't connected which means it wouldn't get those messages.

In short, the data on each broker should stay 100% in-tact from a technical stand-point. The problem is more in the realm of "irreconcilable differences" for the applications connected to the brokers.

The simplest way to mitigate split-brain is to use a shared-store. However, if a shared-store is not viable for your use-case the next best solution is to integrate with ZooKeeper via the pluggable quorum vote replication configuration [1].


Justin

[1]
https://activemq.apache.org/components/artemis/documentation/latest/ha.html#pluggable-quorum-vote-replication-configurations


On Thu, Apr 7, 2022 at 9:53 PM Steigerwald, Aaron <as...@brandesassociates.com.invalid> wrote:

> Hello,
>
> My colleague has read that split brain can cause journal corruption in 
> a master/slave network replication scenario. Is anyone aware if this 
> is possible with current versions of Artemis?
>
> Thank you,
> Aaron Steigerwald
>

RE: [EXTERNAL]:Re: Journal corruption caused by split brain?

Posted by "Steigerwald, Aaron" <as...@brandesassociates.com.INVALID>.
Thank you Justin. This was very helpful. No further questions.

Aaron Steigerwald

-----Original Message-----
From: Justin Bertram <jb...@apache.org> 
Sent: Thursday, May 26, 2022 12:41 AM
To: users@activemq.apache.org
Subject: Re: [EXTERNAL]:Re: Journal corruption caused by split brain?

[CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________


> How much better is ZooKeeper at mitigating split brain than having a 
> 3+
node cluster?

I'm not sure I could quantify the improvement. The goal with the change was to offload the leader election responsibilities from the brokers themselves to something that was better suited for those responsibilities (e.g.
ZooKeeper). Leader election is not a simple problem to solve, and it's not really a core competency of a message broker so integrating with external providers makes sense. Furthermore, tying leader election to the broker meant you had to have a quorum of live brokers in a cluster (i.e. 3 or
more) even if you just wanted a single, replicated HA pair. With the new design that's no longer the case.

> Why would ZooKeeper be better than quorum voting at dealing with 
> network
issues that would cause split brain?

Quorum voting isn't exactly the issue, per se. The real issue is leader election. ZooKeeper is purpose-built for this kind of work whereas the broker is not.

> Are there any plans to drop the traditional Artemis quorum voting
mechanism for the pluggable quorum provider configuration?

I imagine it will be deprecated and phased out eventually, but there are no plans to do so at this time.

> Any idea when it will become officially stable?

As far as I understand it is ready to use now. I have no reason to think otherwise.

Hope that helps.


Justin

On Wed, May 25, 2022 at 10:08 PM Steigerwald, Aaron <as...@brandesassociates.com.invalid> wrote:

> Hello again Justin,
>
> Are there any plans to drop the traditional Artemis quorum voting 
> mechanism for the pluggable quorum provider configuration? Any idea 
> when it will become officially stable?
>
> Thank you,
> Aaron Steigerwald
>
> -----Original Message-----
> From: Steigerwald, Aaron
> Sent: Friday, April 8, 2022 11:46 AM
> To: users@activemq.apache.org
> Subject: RE: [EXTERNAL]:Re: Journal corruption caused by split brain?
>
> Hello Justin,
>
> Thank you for the thorough explanation.
>
> How much better is ZooKeeper at mitigating split brain than having a 
> 3+ node cluster? Why would ZooKeeper be better than quorum voting at 
> dealing with network issues that would cause split brain?
>
> Thanks again,
> Aaron
>
> -----Original Message-----
> From: Justin Bertram <jb...@apache.org>
> Sent: Thursday, April 7, 2022 11:18 PM
> To: users@activemq.apache.org
> Subject: [EXTERNAL]:Re: Journal corruption caused by split brain?
>
> [CAUTION: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender 
> and know the content is safe.] ________________________________
>
>
> It's not clear exactly what is meant here by "corruption," but I think 
> it's probably the wrong word to describe the issues caused by split-brain.
> The brokers should be able to read all the data in the journals no problem.
> The actual problems are more related to potential duplicate 
> consumption or missed messages.
>
> Regarding duplicate consumption, consider 2 JMS consumers listening on 
> a queue on the master broker. Those consumers would be competing for 
> the same messages such that each message would only be consumed once 
> (i.e. by either consumer). However, once split brain occurs you could 
> potentially have a consumer on the same queue on *each* broker. In 
> that case, they could each receive the same message since they were no 
> longer competing with each other.
>
> Regarding missed messages, consider a non-durable JMS topic subscriber.
> While it's connected to the master it receives every message sent to 
> the topic. However, once split brain occurs the producer might send 
> messages to the broker where the subscriber isn't connected which 
> means it wouldn't get those messages.
>
> In short, the data on each broker should stay 100% in-tact from a 
> technical stand-point. The problem is more in the realm of 
> "irreconcilable differences" for the applications connected to the brokers.
>
> The simplest way to mitigate split-brain is to use a shared-store.
> However, if a shared-store is not viable for your use-case the next 
> best solution is to integrate with ZooKeeper via the pluggable quorum 
> vote replication configuration [1].
>
>
> Justin
>
> [1]
>
> https://activemq.apache.org/components/artemis/documentation/latest/ha
> .html#pluggable-quorum-vote-replication-configurations
>
>
> On Thu, Apr 7, 2022 at 9:53 PM Steigerwald, Aaron 
> <as...@brandesassociates.com.invalid> wrote:
>
> > Hello,
> >
> > My colleague has read that split brain can cause journal corruption 
> > in a master/slave network replication scenario. Is anyone aware if 
> > this is possible with current versions of Artemis?
> >
> > Thank you,
> > Aaron Steigerwald
> >
>

RE: [EXTERNAL]:Re: Journal corruption caused by split brain?

Posted by "Steigerwald, Aaron" <as...@brandesassociates.com.INVALID>.
Hello again Justin,

You previously answered my question:

>> Any idea when it will become officially stable?

> As far as I understand it is ready to use now. I have no reason to think otherwise.

Since then I've had good luck using the "Pluggable Quorum Vote Feature" with Zookeeper. I would like to use it in a production environment but some people are reluctant because the documentation still refers to it a "experimental" and implies that it is not "officially stable".

What is the process for getting it considered "officially stable"? Does the community vote on topics like that? I'm asking for planning purposes. My hope is it's no longer experimental once my team and I are ready to put it into production.

Thank you,
Aaron Steigerwald

-----Original Message-----
From: Justin Bertram <jb...@apache.org> 
Sent: Thursday, May 26, 2022 12:41 AM
To: users@activemq.apache.org
Subject: Re: [EXTERNAL]:Re: Journal corruption caused by split brain?

[CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.] ________________________________


> How much better is ZooKeeper at mitigating split brain than having a 
> 3+
node cluster?

I'm not sure I could quantify the improvement. The goal with the change was to offload the leader election responsibilities from the brokers themselves to something that was better suited for those responsibilities (e.g.
ZooKeeper). Leader election is not a simple problem to solve, and it's not really a core competency of a message broker so integrating with external providers makes sense. Furthermore, tying leader election to the broker meant you had to have a quorum of live brokers in a cluster (i.e. 3 or
more) even if you just wanted a single, replicated HA pair. With the new design that's no longer the case.

> Why would ZooKeeper be better than quorum voting at dealing with 
> network
issues that would cause split brain?

Quorum voting isn't exactly the issue, per se. The real issue is leader election. ZooKeeper is purpose-built for this kind of work whereas the broker is not.

> Are there any plans to drop the traditional Artemis quorum voting
mechanism for the pluggable quorum provider configuration?

I imagine it will be deprecated and phased out eventually, but there are no plans to do so at this time.

> Any idea when it will become officially stable?

As far as I understand it is ready to use now. I have no reason to think otherwise.

Hope that helps.


Justin

On Wed, May 25, 2022 at 10:08 PM Steigerwald, Aaron <as...@brandesassociates.com.invalid> wrote:

> Hello again Justin,
>
> Are there any plans to drop the traditional Artemis quorum voting 
> mechanism for the pluggable quorum provider configuration? Any idea 
> when it will become officially stable?
>
> Thank you,
> Aaron Steigerwald
>
> -----Original Message-----
> From: Steigerwald, Aaron
> Sent: Friday, April 8, 2022 11:46 AM
> To: users@activemq.apache.org
> Subject: RE: [EXTERNAL]:Re: Journal corruption caused by split brain?
>
> Hello Justin,
>
> Thank you for the thorough explanation.
>
> How much better is ZooKeeper at mitigating split brain than having a 
> 3+ node cluster? Why would ZooKeeper be better than quorum voting at 
> dealing with network issues that would cause split brain?
>
> Thanks again,
> Aaron
>
> -----Original Message-----
> From: Justin Bertram <jb...@apache.org>
> Sent: Thursday, April 7, 2022 11:18 PM
> To: users@activemq.apache.org
> Subject: [EXTERNAL]:Re: Journal corruption caused by split brain?
>
> [CAUTION: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender 
> and know the content is safe.] ________________________________
>
>
> It's not clear exactly what is meant here by "corruption," but I think 
> it's probably the wrong word to describe the issues caused by split-brain.
> The brokers should be able to read all the data in the journals no problem.
> The actual problems are more related to potential duplicate 
> consumption or missed messages.
>
> Regarding duplicate consumption, consider 2 JMS consumers listening on 
> a queue on the master broker. Those consumers would be competing for 
> the same messages such that each message would only be consumed once 
> (i.e. by either consumer). However, once split brain occurs you could 
> potentially have a consumer on the same queue on *each* broker. In 
> that case, they could each receive the same message since they were no 
> longer competing with each other.
>
> Regarding missed messages, consider a non-durable JMS topic subscriber.
> While it's connected to the master it receives every message sent to 
> the topic. However, once split brain occurs the producer might send 
> messages to the broker where the subscriber isn't connected which 
> means it wouldn't get those messages.
>
> In short, the data on each broker should stay 100% in-tact from a 
> technical stand-point. The problem is more in the realm of 
> "irreconcilable differences" for the applications connected to the brokers.
>
> The simplest way to mitigate split-brain is to use a shared-store.
> However, if a shared-store is not viable for your use-case the next 
> best solution is to integrate with ZooKeeper via the pluggable quorum 
> vote replication configuration [1].
>
>
> Justin
>
> [1]
>
> https://activemq.apache.org/components/artemis/documentation/latest/ha
> .html#pluggable-quorum-vote-replication-configurations
>
>
> On Thu, Apr 7, 2022 at 9:53 PM Steigerwald, Aaron 
> <as...@brandesassociates.com.invalid> wrote:
>
> > Hello,
> >
> > My colleague has read that split brain can cause journal corruption 
> > in a master/slave network replication scenario. Is anyone aware if 
> > this is possible with current versions of Artemis?
> >
> > Thank you,
> > Aaron Steigerwald
> >
>

Re: [EXTERNAL]:Re: Journal corruption caused by split brain?

Posted by Justin Bertram <jb...@apache.org>.
> How much better is ZooKeeper at mitigating split brain than having a 3+
node cluster?

I'm not sure I could quantify the improvement. The goal with the change was
to offload the leader election responsibilities from the brokers themselves
to something that was better suited for those responsibilities (e.g.
ZooKeeper). Leader election is not a simple problem to solve, and it's not
really a core competency of a message broker so integrating with external
providers makes sense. Furthermore, tying leader election to the broker
meant you had to have a quorum of live brokers in a cluster (i.e. 3 or
more) even if you just wanted a single, replicated HA pair. With the new
design that's no longer the case.

> Why would ZooKeeper be better than quorum voting at dealing with network
issues that would cause split brain?

Quorum voting isn't exactly the issue, per se. The real issue is leader
election. ZooKeeper is purpose-built for this kind of work whereas the
broker is not.

> Are there any plans to drop the traditional Artemis quorum voting
mechanism for the pluggable quorum provider configuration?

I imagine it will be deprecated and phased out eventually, but there are no
plans to do so at this time.

> Any idea when it will become officially stable?

As far as I understand it is ready to use now. I have no reason to think
otherwise.

Hope that helps.


Justin

On Wed, May 25, 2022 at 10:08 PM Steigerwald, Aaron
<as...@brandesassociates.com.invalid> wrote:

> Hello again Justin,
>
> Are there any plans to drop the traditional Artemis quorum voting
> mechanism for the pluggable quorum provider configuration? Any idea when it
> will become officially stable?
>
> Thank you,
> Aaron Steigerwald
>
> -----Original Message-----
> From: Steigerwald, Aaron
> Sent: Friday, April 8, 2022 11:46 AM
> To: users@activemq.apache.org
> Subject: RE: [EXTERNAL]:Re: Journal corruption caused by split brain?
>
> Hello Justin,
>
> Thank you for the thorough explanation.
>
> How much better is ZooKeeper at mitigating split brain than having a 3+
> node cluster? Why would ZooKeeper be better than quorum voting at dealing
> with network issues that would cause split brain?
>
> Thanks again,
> Aaron
>
> -----Original Message-----
> From: Justin Bertram <jb...@apache.org>
> Sent: Thursday, April 7, 2022 11:18 PM
> To: users@activemq.apache.org
> Subject: [EXTERNAL]:Re: Journal corruption caused by split brain?
>
> [CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.] ________________________________
>
>
> It's not clear exactly what is meant here by "corruption," but I think
> it's probably the wrong word to describe the issues caused by split-brain.
> The brokers should be able to read all the data in the journals no problem.
> The actual problems are more related to potential duplicate consumption or
> missed messages.
>
> Regarding duplicate consumption, consider 2 JMS consumers listening on a
> queue on the master broker. Those consumers would be competing for the same
> messages such that each message would only be consumed once (i.e. by either
> consumer). However, once split brain occurs you could potentially have a
> consumer on the same queue on *each* broker. In that case, they could each
> receive the same message since they were no longer competing with each
> other.
>
> Regarding missed messages, consider a non-durable JMS topic subscriber.
> While it's connected to the master it receives every message sent to the
> topic. However, once split brain occurs the producer might send messages to
> the broker where the subscriber isn't connected which means it wouldn't get
> those messages.
>
> In short, the data on each broker should stay 100% in-tact from a
> technical stand-point. The problem is more in the realm of "irreconcilable
> differences" for the applications connected to the brokers.
>
> The simplest way to mitigate split-brain is to use a shared-store.
> However, if a shared-store is not viable for your use-case the next best
> solution is to integrate with ZooKeeper via the pluggable quorum vote
> replication configuration [1].
>
>
> Justin
>
> [1]
>
> https://activemq.apache.org/components/artemis/documentation/latest/ha.html#pluggable-quorum-vote-replication-configurations
>
>
> On Thu, Apr 7, 2022 at 9:53 PM Steigerwald, Aaron
> <as...@brandesassociates.com.invalid> wrote:
>
> > Hello,
> >
> > My colleague has read that split brain can cause journal corruption in
> > a master/slave network replication scenario. Is anyone aware if this
> > is possible with current versions of Artemis?
> >
> > Thank you,
> > Aaron Steigerwald
> >
>