You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Miroslav Novak (JIRA)" <ji...@apache.org> on 2016/08/11 06:40:20 UTC

[jira] [Comment Edited] (ARTEMIS-473) Resolve split brain data after split brains scenarios.

    [ https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416643#comment-15416643 ] 

Miroslav Novak edited comment on ARTEMIS-473 at 8/11/16 6:39 AM:
-----------------------------------------------------------------

I've created new jira for the description - ARTEMIS-679 - Activate most up to date server from master-slave(live-backup) pair. 

If split brain happens then there is not much Artemis can do about it. Still it can recover from quite common cases. Basically 3 situation can happen when split brain happens (=master and slave are active at the same time):

a) Clients do not loose connection to master and stay connected to master.
b) Clients loose connection to master and failover to backup. 
c) Clients loose connection to master and slave at same time. They're trying to reconnect to master-slave pair. 

I believe that for situations a) and b) Artemis can recover when network is reconnected. In the moment when master and slave notice that they're active at the same time, they will check who has external (no in-vm) connections. Server without external client connections will restart. Only server with the clients has the up-to-date journal. 

Option c) is problematic as clients can connect to master or slave so in this case there is nothing Artemis can do. wdyt?


was (Author: mnovak):
I've created new jira for the description - ARTEMIS-679 - Activate most up to date server from master-slave(live-backup) pair. 

If split brain happens then there is not much Artemis can do about it. Still it can recover from quite common cases. Basically 3 situation can happen when split brain happens (=master and slave are active at the same time):

a) Clients do not loose connection to master and stay connected to master.
b) Clients loose connection to master and failover backup. 
c) Clients loose connection to master and slave at same time. They will try to reconnect to master or slave pair. 

I believe that for situations a) and b) Artemis can recover when network is reconnected. In the moment when master and slave notice that they're active at the same time, they will check who has external (no in-vm) connections. Server without external client connections will restart. Only server with the clients has the up-to-date journal. 

Option c) is problematic as clients can connect to master or slave so in this case there is nothing Artemis can do. wdyt?

> Resolve split brain data after split brains scenarios.
> ------------------------------------------------------
>
>                 Key: ARTEMIS-473
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-473
>             Project: ActiveMQ Artemis
>          Issue Type: New Feature
>          Components: Broker
>    Affects Versions: 1.2.0
>            Reporter: Miroslav Novak
>            Priority: Critical
>
> If master-slave pair is configured using replicated journal and there are no other servers in cluster then if network between master and slave is broken then slave will activate. Depending on whether clients were disconnected from master or not there might be or might not be failover to slave. Problem happens in the moment when network between master and slave is restored. Master and slave are active at the same time which is the split brain syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will restart itself so failback occurs (if configured). If clients did not failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)