You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/02/03 15:41:00 UTC

[jira] [Commented] (ARTEMIS-4143) Improve mitigation against split-brain with shared-storage

    [ https://issues.apache.org/jira/browse/ARTEMIS-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683924#comment-17683924 ] 

ASF subversion and git services commented on ARTEMIS-4143:
----------------------------------------------------------

Commit 8f30347b18145509bde27194d83468fc1a5915d6 in activemq-artemis's branch refs/heads/main from Justin Bertram
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=8f30347b18 ]

ARTEMIS-4143 improve mitigation against split-brain with shared-storage

Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:

  1) Primary loses network connection to NFS.
  2) Backup activates.
  3) Primary reconnects to NFS.
  4) Split-brain.

In reality this situation is pretty unlikely due to the timing involved,
but the possibility still exists. Currently the file lock held by the
primary broker on the NFS share is essentially worthless in this
situation. This commit adds logic by which the timestamp of the lock
file is updated during activation and then routinely checked during
runtime to ensure consistency. This effectively mitigates split-brain in
this situation (and likely others). Here's how it works now.

  1) Primary loses network connection to NFS.
  2) Backup activates.
  3) Primary reconnects to NFS.
  4) Primary detects that the lock file's timestamp has been updated and
     shuts itself down.

When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for
the replicated use-case. This commit applies the protection for
removeMember() so that the Topology remains intact.

There are no tests for these changes as I cannot determine how to
properly simulate this use-case. However, there have never been robust,
automated tests for these kinds of NFS use-cases so this is not a
departure from the norm.


> Improve mitigation against split-brain with shared-storage
> ----------------------------------------------------------
>
>                 Key: ARTEMIS-4143
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4143
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Justin Bertram
>            Assignee: Justin Bertram
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)