You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org> on 2012/02/13 02:01:06 UTC

[jira] [Commented] (CASSANDRA-3833) support arbitrary topology transitions

    [ https://issues.apache.org/jira/browse/CASSANDRA-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206599#comment-13206599 ] 

Peter Schuller commented on CASSANDRA-3833:
-------------------------------------------

CASSANDRA-3901 explains the basics of the first of the two points listed above, and how I believe this is already not correct in the current version.

It gets more complicated for arbitrary transitions for the following reason:

The easiest way to implement arbitrary transitions would be to just require that a transition completes fully, or not at all. This avoids complexity in responsibility calculations, with each node being responsible (in the write set) for the parts of the ring that it will eventually be responsible for when all completes.

But clearly, from an operator's standpoint, it would be good to allow ad-hoc changing of a transition change. Supposing you're bootstrapping 100 nodes into a cluster and 2 of them turn out to be broken, you'd like to just be able to say 'oh well, nevermind those 2 for now I'll come back to them later [when h/w ix fixed for example]". The problem is that if there is overlap between hosts being inserted into the cluster (I'm using the word "inserted" and assuming node bootstrap for simplicity; the equivalent holds true for any change) other nodes will not have been part of the write set so you cannot just forget about the ones that aren't up yet.

On way to address this is to not consider overlapping nodes when calculating the write set, preferring to write "too much" data (as the case is today). Another way is to do the full calculations and have additional streaming happen when a topology change is adjusted - but that seems excessively complex.

Yet a third way is to specifically support the concept of a node which "was supposed to be at token X and this other node Y was bootstrapped with that in mind". This is similar to what was discussed in CASSANDRA-3483 and can get complex.

                
> support arbitrary topology transitions 
> ---------------------------------------
>
>                 Key: CASSANDRA-3833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3833
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> Once we have the locator abstracted (with the gossiper being a
> particular concrete implementation), we want to change the locator
> abstraction to not express changes in ring topology on a per-node
> basis; rather we want to use an abstraction which communicates two
> arbitrary ring states; one state for the read set, and one for the
> write set.
> Once this abstraction is in place, the (pluggable) locator will be
> able to make bulk changes to a ring at once. Main points:
> * Must be careful in handling consistency level during ring
>   transitions, such that a given node in the read set corresponds to a
>   specific node in the write set. This will impose some restrictions
>   on completion of transitions, to avoid code complexity, so it is an
>   important point.
> * All code outside of gossip (and any other locator that works
>   similarly) will be agnostic about individual changes to nodes, and
>   will instead only be notified when new ring states are available (in
>   aggregate). This makes the change non-trivial because all code that
>   currently is oriented around individual node changes always
>   producing a valid ring, will have to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira