You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by ahuhatwork <ah...@protonmail.com> on 2019/06/05 21:34:34 UTC

Artemis HA with multiple standby slaves behaviour

I just want to confirm that this is the expected behaviour. I have 1 master
with 3 slaves (the brokers are hosted on VMs that tend to randomly die). I'm
currently testing this on the latest source code from github. 

Here's the scenario:
1) Start master
2) Start slave1
3) Start slave2
4) Kill master, slave1 takes over as the live server
5) Bring back master

Configuration snippet for master:


Configuration snippet for slave1 and slave2:


At this point, which server is the live server? I would think that due to
failback being configured, the master would resume being the live server. It
seems that slave1 stays on as the live server. Is this the expected
behaviour?




--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Artemis HA with multiple standby slaves behaviour

Posted by Justin Bertram <jb...@apache.org>.

This is a valid setup even though failback won't work as expected. There
should be no more risk of data loss in this setup as there is in any other.

Justin

On Thu, Jun 6, 2019 at 2:49 AM Bummer <je...@centrum.cz> wrote:

> This isn't a valid setup. Only one slave per master can work as expected.
> You're about to lose data if you continue this way. I was there recently.
> Look this topic up on the forums to get more information about the reasons.
> This setup is surprisingly common.
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>

Re: Artemis HA with multiple standby slaves behaviour

Posted by Bummer <je...@centrum.cz>.

This isn't a valid setup. Only one slave per master can work as expected.
You're about to lose data if you continue this way. I was there recently.
Look this topic up on the forums to get more information about the reasons.
This setup is surprisingly common.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Artemis HA with multiple standby slaves behaviour

Posted by ahuhatwork <ah...@protonmail.com>.

Thanks for the insight Justin.

Albert



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Artemis HA with multiple standby slaves behaviour

Posted by Justin Bertram <jb...@apache.org>.

> Speaking of split brains, I haven't really been able to discern how to
> recover from a split brain. What are the general techniques to recovering
> from a split brain?

Recovering from a split-brain may be impossible depending on what happens
during the split and what kind of logging you have in place or it may be as
simple as just restarting the backup. Consider that once the brokers split
they will have the same data, and as different clients interact with one
broker or the other then each broker's copy of the data will begin to
diverge from the other. One broker may receive messages that the other
doesn't and clients may consume messages from one that aren't consumed from
the other.

If *no* client ever connects to the split slave then you won't have any of
these problems. The solution in this case is just to restart the slave.
Obviously the faster you can detect the split the better as that reduces
the chances that any client will connect to the split slave.

However, if one or more clients do connect to the split slave then you'll
need to identify what that client did. If all the client does is produce
messages then you should be able to see that simply by comparing the
journal data from each broker [1]. Likewise if the client simply consumes
messages. If both consumption and production occur then it can get a bit
more tricky. If this happens on different queues (e.g. produce on queue
"foo", consume from queue "bar") then a data comparison again should work.
However, if production and consumption happen on the same queue then you
could run into a situation where the same message is produced and consumed
from one broker and that wouldn't show up in a data comparison since the
message would have been consumed and wouldn't be in the journal. The
LoggingActiveMQServerPlugin [2] should help with that, but of course it
will need to be active/configured *before* the split occurs.

In any event, this is a tedious, manual, and error prone process. You'll
want to avoid it if at all possible.

Justin

[1] For example, using the journal XML export tool (using this CLI command:
./artemis data exp ...)
[2]
http://activemq.apache.org/components/artemis/documentation/latest/broker-plugins.html#using-the-loggingactivemqserverplugin

On Thu, Jun 6, 2019 at 2:48 PM ahuhatwork <ah...@protonmail.com> wrote:

> Thanks for the quick response Justin.
>
> I've configured Artemis to use replication as the infrastructure for
> shared-storage isn't... great.
>
> So for my situation at work, the hypervisors tend to randomly die on us
> (and
> thus taking the VMs with them). We have 3 zones/hypervisors.
>
> I wanted a single master because we do not think the workload is high
> enough
> to require more than that. Due to the random hypervisor deaths, I wanted a
> slave running in each zone.
>
> So:
> zone 1: master
> zone 2: slave
> zone 3: slave
>
> The zones are connected by a nice giant heavy duty router which handles all
> traffic, so I find it difficult to imagine that a split brain can occur for
> that particular reason (perhaps other reasons which I am ignorant to).
>
> Speaking of split brains, I haven't really been able to discern how to
> recover from a split brain. What are the general techniques to recovering
> from a split brain?
>
> Albert
>
>
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>

Re: Artemis HA with multiple standby slaves behaviour

Posted by ahuhatwork <ah...@protonmail.com>.

Thanks for the quick response Justin.

I've configured Artemis to use replication as the infrastructure for
shared-storage isn't... great.

So for my situation at work, the hypervisors tend to randomly die on us (and
thus taking the VMs with them). We have 3 zones/hypervisors.

I wanted a single master because we do not think the workload is high enough
to require more than that. Due to the random hypervisor deaths, I wanted a
slave running in each zone.

So:
zone 1: master
zone 2: slave
zone 3: slave

The zones are connected by a nice giant heavy duty router which handles all
traffic, so I find it difficult to imagine that a split brain can occur for
that particular reason (perhaps other reasons which I am ignorant to).

Speaking of split brains, I haven't really been able to discern how to
recover from a split brain. What are the general techniques to recovering
from a split brain? 

Albert





--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Artemis HA with multiple standby slaves behaviour

Posted by Justin Bertram <jb...@apache.org>.

At this point using multiple backups will preclude fail-back from working
as generally expected so the behavior you're seeing is expected.

Out of curiosity, are you using shared-storage or replication? If you're
using replication keep in mind that you'll want at least 3 master/slave
pairs to achieve a valid quorum to mitigate the risk of split-brain.

Justin

On Wed, Jun 5, 2019 at 4:34 PM ahuhatwork <ah...@protonmail.com> wrote:

> I just want to confirm that this is the expected behaviour. I have 1 master
> with 3 slaves (the brokers are hosted on VMs that tend to randomly die).
> I'm
> currently testing this on the latest source code from github.
>
> Here's the scenario:
> 1) Start master
> 2) Start slave1
> 3) Start slave2
> 4) Kill master, slave1 takes over as the live server
> 5) Bring back master
>
> Configuration snippet for master:
>
>
> Configuration snippet for slave1 and slave2:
>
>
> At this point, which server is the live server? I would think that due to
> failback being configured, the master would resume being the live server.
> It
> seems that slave1 stays on as the live server. Is this the expected
> behaviour?
>
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>