You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2020/07/28 11:47:00 UTC
[jira] [Comment Edited] (CASSANDRA-14559) Check for endpoint collision with hibernating nodes

    [ https://issues.apache.org/jira/browse/CASSANDRA-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166360#comment-17166360 ] 

Stefan Miklosovic edited comment on CASSANDRA-14559 at 7/28/20, 11:46 AM:
--------------------------------------------------------------------------

PRs for trunk and 3.11

3.11: [https://github.com/apache/cassandra/pull/700/commits]

trunk: [https://github.com/apache/cassandra/pull/699]

 

dtest PR: 

[https://github.com/apache/cassandra-dtest/pull/87]

 

[~dcapwell] [~Bereng] would any of your mind to go over this? thanks!


was (Author: stefan.miklosovic):
PRs for trunk and 3.11

3.11: [https://github.com/apache/cassandra/pull/700/commits]

trunk: [https://github.com/apache/cassandra/pull/699]

 

dtest PR: 

[https://github.com/apache/cassandra-dtest/pull/87]

> Check for endpoint collision with hibernating nodes 
> ----------------------------------------------------
>
>                 Key: CASSANDRA-14559
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14559
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Distributed Metadata
>            Reporter: Vincent White
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>
> I ran across an edge case when replacing a node with the same address. This issue results in the node(and its tokens) being unsafely removed from gossip.
> Steps to replicate:
> 1. Create 3 node cluster.
> 2. Stop a node
> 3. Replace the stopped node with a node using the same address using the replace_address flag
> 4. Stop the node before it finishes bootstrapping
> 5. Remove the replace_address flag and restart the node to resume bootstrapping (if the data dir is also cleared at this point the node will also generate new tokens when it starts)
> 6. Stop the node before it finishes bootstrapping again
> 7. 30 Seconds later the node will be removed from gossip because it now matches the check for a FatClient
> I think this is only an issue when replacing a node with the same address because other replacements now use STATUS_BOOTSTRAPPING_REPLACE and leave the dead node unchanged.
> I believe the simplest fix for this is to add a check that prevents a non-bootstrapped node (without the replaces_address flag) starting if there is a gossip entry for the same address in the hibernate state. 
> [3.11 PoC |https://github.com/apache/cassandra/compare/trunk...vincewhite:check_for_hibernate_on_start]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org