You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/08/25 11:27:00 UTC

[jira] [Work logged] (ARTEMIS-3429) Backup forget coordination-id after quorum loss

     [ https://issues.apache.org/jira/browse/ARTEMIS-3429?focusedWorklogId=641619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641619 ]

ASF GitHub Bot logged work on ARTEMIS-3429:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Aug/21 11:26
            Start Date: 25/Aug/21 11:26
    Worklog Time Spent: 10m 
      Work Description: franz1981 commented on pull request #3694:
URL: https://github.com/apache/activemq-artemis/pull/3694#issuecomment-905414433


   I'm still waiting @gtully to come back and we can quickly review this before merging. CI is already good...
   The only point to decide is when to persist local replica NodeID and activation sequence:
   
   - I've decided to persist them right after the initial sync happen, because the replication process should already take care (on the replicated server) to await replica response before answering back to the client ie data delta from initial sync shouldn't diverge between live-backup while the coordinated activation sequence is still the same (will need @clebertsuconic opinion here too)
   - `classic` replication instead, persist them only if backup is stopped or if backup is successfully failing-over
   
   The implication of using these strategy is that after a simultaneous crash of both brokers and restart of just the backup:
   - the former allow it to start as live because its data is in-sync (that's correct)
   - the latter prevent it to start as live, because it didn't store nodeID and local activation sequence, hence it's still appear as an empty backup
   
   To me, using the former strategy increase HA in case of crashes, assuming that the replication process is correctly syncing data (dealing correctly with delta/in-flight changes after the initial sync).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 641619)
    Remaining Estimate: 0h
            Time Spent: 10m

> Backup forget coordination-id after quorum loss
> -----------------------------------------------
>
>                 Key: ARTEMIS-3429
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3429
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.18.0
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Assuming a multi-primary set-up, if the broker acting as backup lost quorum, is restarted without applying the coordination-id patching on NodeManager:  if its local activation sequence is > 0 (because of a past sync with the other live) the backup succeed to activate, causing a split-brain (although its NodeID is a random one vs the original coordination-id).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)