You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jaakko Laine (JIRA)" <ji...@apache.org> on 2009/12/03 09:42:20 UTC
[jira] Updated: (CASSANDRA-572) handle old gossip properly

     [ https://issues.apache.org/jira/browse/CASSANDRA-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jaakko Laine updated CASSANDRA-572:
-----------------------------------

    Attachment: use-same-APstate-for-all-node-state-gossip.patch

OK, here's a patch that uses same state name (NODE_STATE) to gossip all movement information. Format is (BOOTSTRAPPING|NORMAL|LEAVING|LEFT)|token.

The main things caused by this modification to the state machine were:
(1) When a node is bootstrapping, we should clear pending ranges for this endpoint, as well as remove it from token metadata. These checks are not strictly necessary (I think), but are there to help transition from LEAVING -> BOOTSTRAPPING in case we missed LEFT due to network partition.
(2) For handleStateLeaving and handleStateLeft remove pending ranges for this endpoint before doing anything else. If we missed NORMAL, there might be obsolete pending ranges from BOOTSTRAP. Distant possibility, but possibility nonetheless.

Following additional check is not directly related to gossip format change and could happen even using the current model. This is a very unlikely event, but in a large (say, 200+ nodes) multi-DC cluster with lots of node movement, this could very well happen even with relatively short DC-to-DC network outage:
(1) Added a check to handleStateLeaving and handleStateLeft for the case that a node has made NORMAL -> LEAVING -> LEFT -> BOOTSTRAP -> NORMAL -> LEAVING [->LEFT] movement cycle without us seeing the intermediate stages. In this case we have information for the old token and now the node is leaving _new_ token. We cannot simply assert this, as it is possible this happens.

Now of course this already touches the subject what conditions we must take care of and what should be left to operators to handle. Some of them (like removing all references to the endpoint before continuing to handle bootstrapping) are questionable and might relax safety precautions, but if we do not do that, a modest 30s network outage might cause us not to see STATE_LEFT and we'd end up having strange pending ranges.

I don't expect this patch to be included as it is, but let's see what people think of this gossip change and then discuss what checks should be made :)


> handle old gossip properly
> --------------------------
>
>                 Key: CASSANDRA-572
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-572
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>            Reporter: Jaakko Laine
>             Fix For: 0.5
>
>         Attachments: 572-handle-old-gossip.patch, use-same-APstate-for-all-node-state-gossip.patch
>
>
> (1) If a node has been moving in the ring, further bootstraps by other nodes will cause errors as they are handling STATE_LEAVING gossip without having such member in token metadata.
> (2) When a node bootstraps, it handles all ep states in the order they happen to arrive. If the first one to arrive has moved in the past (that is, it has STATE_LEAVING in its ep state), getNaturalEndpoint will throw ArrayIndexOutOfBounds exception as sortedTokens.size() == 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.