You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2015/02/02 23:29:36 UTC

[jira] [Comment Edited] (CASSANDRA-8336) Quarantine nodes after receiving the gossip shutdown message

    [ https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288251#comment-14288251 ] 

Brandon Williams edited comment on CASSANDRA-8336 at 2/2/15 10:28 PM:
----------------------------------------------------------------------

This patch helps, but the problem with this is approach is the node can still flap, given a disjoint enough (gossip state-wise) cluster.  There are a few ways we can solve this:

* quarantine after shutdown.  This has the consequence of not being able to restart a node until the quarantine expires.

* Sleep for ring_delay or some interval after setting the shutdown state before sending the rpc shutdown.  I'm not 100% sure this would prevent the flapping, and sleeping that long on shutdown sucks as equally as not being able to reboot until the quarantine expires.

* I suggest a third way, which I'll discuss below.

The method suggests when node X receives a shutdown event from Y, it will update its local state for Y to version Integer.MAX_VALUE, and thus no updates for the same generation will be accepted since they will always have a lower version.  When Y restarts it will have a new generation and everything will work normally.  

There is one consequence to this method, and that is that gossipdisable/enable has to now generate a new generation, which triggers the "has restarted, now UP" message on other nodes, but this seems like a fairly minor thing.

On the surface, it may seem easier to have Y just send with a version of MAX_VALUE, but that will only apply to nodes that receive it via gossip, not the ones that receive it via rpc which is likely the bulk of them, and it wouldn't be an optimization anyway since we only sleep for one gossip round, and the node(s) we gossip to will set the version anyway before propagating it to the rest of the cluster.

v2 does this.


was (Author: brandon.williams):
This patch helps, but the problem with this is approach is the node can still flap, given a disjoint enough (gossip state-wise) cluster.  There are a few ways we can solve this:

* quarantine after shutdown.  This has the consequence of not being able to restart a node until the quarantine expires.

* Sleep for ring_delay or some interval after setting the shutdown state before sending the rpc shutdown.  I'm not 100% sure this would prevent the flapping, and sleeping that long on shutdown sucks as equally as not being able to reboot until the quarantine expires.

* Offline Richard suggested to me a third way, which I'll discuss below.

The method suggests when node X receives a shutdown event from Y, it will update its local state for Y to version Integer.MAX_VALUE, and thus no updates for the same generation will be accepted since they will always have a lower version.  When Y restarts it will have a new generation and everything will work normally.  

There is one consequence to this method, and that is that gossipdisable/enable has to now generate a new generation, which triggers the "has restarted, now UP" message on other nodes, but this seems like a fairly minor thing.

On the surface, it may seem easier to have Y just send with a version of MAX_VALUE, but that will only apply to nodes that receive it via gossip, not the ones that receive it via rpc which is likely the bulk of them, and it wouldn't be an optimization anyway since we only sleep for one gossip round, and the node(s) we gossip to will set the version anyway before propagating it to the rest of the cluster.

v2 does this.

> Quarantine nodes after receiving the gossip shutdown message
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8336
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8336
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.13
>
>         Attachments: 8336-v2.txt, 8336.txt
>
>
> In CASSANDRA-3936 we added a gossip shutdown announcement.  The problem here is that this isn't sufficient; you can still get TOEs and have to wait on the FD to figure things out.  This happens due to gossip propagation time and variance; if node X shuts down and sends the message to Y, but Z has a greater gossip version than Y for X and has not yet received the message, it can initiate gossip with Y and thus mark X alive again.  I propose quarantining to solve this, however I feel it should be a -D parameter you have to specify, so as not to destroy current dev and test practices, since this will mean a node that shuts down will not be able to restart until the quarantine expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)