You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Schuller (Created) (JIRA)" <ji...@apache.org> on 2012/02/02 00:45:53 UTC

[jira] [Created] (CASSANDRA-3833) support locator support arbitrary topology transitions

support locator support arbitrary topology transitions 
-------------------------------------------------------

                 Key: CASSANDRA-3833
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3833
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Peter Schuller
            Assignee: Peter Schuller


Once we have the locator abstracted (with the gossiper being a
particular concrete implementation), we want to change the locator
abstraction to not express changes in ring topology on a per-node
basis; rather we want to use an abstraction which communicates two
arbitrary ring states; one state for the read set, and one for the
write set.

Once this abstraction is in place, the (pluggable) locator will be
able to make bulk changes to a ring at once. Main points:

* Must be careful in handling consistency level during ring
  transitions, such that a given node in the read set corresponds to a
  specific node in the write set. This will impose some restrictions
  on completion of transitions, to avoid code complexity, so it is an
  important point.

* All code outside of gossip (and any other locator that works
  similarly) will be agnostic about individual changes to nodes, and
  will instead only be notified when new ring states are available (in
  aggregate). This makes the change non-trivial because all code that
  currently is oriented around individual node changes always
  producing a valid ring, will have to be changed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3833) support arbitrary topology transitions

Posted by "Peter Schuller (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller updated CASSANDRA-3833:
--------------------------------------

    Summary: support arbitrary topology transitions   (was: support locator support arbitrary topology transitions )
    
> support arbitrary topology transitions 
> ---------------------------------------
>
>                 Key: CASSANDRA-3833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3833
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> Once we have the locator abstracted (with the gossiper being a
> particular concrete implementation), we want to change the locator
> abstraction to not express changes in ring topology on a per-node
> basis; rather we want to use an abstraction which communicates two
> arbitrary ring states; one state for the read set, and one for the
> write set.
> Once this abstraction is in place, the (pluggable) locator will be
> able to make bulk changes to a ring at once. Main points:
> * Must be careful in handling consistency level during ring
>   transitions, such that a given node in the read set corresponds to a
>   specific node in the write set. This will impose some restrictions
>   on completion of transitions, to avoid code complexity, so it is an
>   important point.
> * All code outside of gossip (and any other locator that works
>   similarly) will be agnostic about individual changes to nodes, and
>   will instead only be notified when new ring states are available (in
>   aggregate). This makes the change non-trivial because all code that
>   currently is oriented around individual node changes always
>   producing a valid ring, will have to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3833) support arbitrary topology transitions

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206599#comment-13206599 ] 

Peter Schuller commented on CASSANDRA-3833:
-------------------------------------------

CASSANDRA-3901 explains the basics of the first of the two points listed above, and how I believe this is already not correct in the current version.

It gets more complicated for arbitrary transitions for the following reason:

The easiest way to implement arbitrary transitions would be to just require that a transition completes fully, or not at all. This avoids complexity in responsibility calculations, with each node being responsible (in the write set) for the parts of the ring that it will eventually be responsible for when all completes.

But clearly, from an operator's standpoint, it would be good to allow ad-hoc changing of a transition change. Supposing you're bootstrapping 100 nodes into a cluster and 2 of them turn out to be broken, you'd like to just be able to say 'oh well, nevermind those 2 for now I'll come back to them later [when h/w ix fixed for example]". The problem is that if there is overlap between hosts being inserted into the cluster (I'm using the word "inserted" and assuming node bootstrap for simplicity; the equivalent holds true for any change) other nodes will not have been part of the write set so you cannot just forget about the ones that aren't up yet.

On way to address this is to not consider overlapping nodes when calculating the write set, preferring to write "too much" data (as the case is today). Another way is to do the full calculations and have additional streaming happen when a topology change is adjusted - but that seems excessively complex.

Yet a third way is to specifically support the concept of a node which "was supposed to be at token X and this other node Y was bootstrapped with that in mind". This is similar to what was discussed in CASSANDRA-3483 and can get complex.

                
> support arbitrary topology transitions 
> ---------------------------------------
>
>                 Key: CASSANDRA-3833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3833
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> Once we have the locator abstracted (with the gossiper being a
> particular concrete implementation), we want to change the locator
> abstraction to not express changes in ring topology on a per-node
> basis; rather we want to use an abstraction which communicates two
> arbitrary ring states; one state for the read set, and one for the
> write set.
> Once this abstraction is in place, the (pluggable) locator will be
> able to make bulk changes to a ring at once. Main points:
> * Must be careful in handling consistency level during ring
>   transitions, such that a given node in the read set corresponds to a
>   specific node in the write set. This will impose some restrictions
>   on completion of transitions, to avoid code complexity, so it is an
>   important point.
> * All code outside of gossip (and any other locator that works
>   similarly) will be agnostic about individual changes to nodes, and
>   will instead only be notified when new ring states are available (in
>   aggregate). This makes the change non-trivial because all code that
>   currently is oriented around individual node changes always
>   producing a valid ring, will have to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3833) support arbitrary topology transitions

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206668#comment-13206668 ] 

Peter Schuller commented on CASSANDRA-3833:
-------------------------------------------

I forgot another thing: If one goes for the option of writing to all possible endpoints that might be up, it also has the effect of significantly increasing the probability of Unavailable exceptions. Consider that you're streaming data in a replica set to 3 different nodes that have overlapping future ownership (pending ranges). If you need to allow each node going up individually, not only do you need to write more as mentioned above, every single request that touches data that is shared between one existing host in the replica set and multiple potential futures, would have to be required for consistency purposes.

In other words, whenever you're performing a write, you essentially have to keep track of the current read set, the write set, *and* the possible "future equivalence" between the two. If data is moving from A to B, with A being in the read set and B being in the write set (read set = A, write set = AB (using RF=1 for simplicity)), any write satisfied with the help of an ACK from A must also be satisfied by B. Otherwise the data is effectively lost once finishes the join.

For arbitrary ring changes the number of overlaps could potentially be higher than 1. For example if you have RF=3 and triple the size of a cluster, you're now inserting two nodes in between each pre-existing node. So for any replica set for a row key with read set ABC, the future read set will be AXY (XY being bootstrapped nodes), and while they're joining, the write set will be AXYBC. But any write satisfying CL via A, must also be ack:ed by X and Y because we don't know which one of them will finish bootstrapping first.


                
> support arbitrary topology transitions 
> ---------------------------------------
>
>                 Key: CASSANDRA-3833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3833
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> Once we have the locator abstracted (with the gossiper being a
> particular concrete implementation), we want to change the locator
> abstraction to not express changes in ring topology on a per-node
> basis; rather we want to use an abstraction which communicates two
> arbitrary ring states; one state for the read set, and one for the
> write set.
> Once this abstraction is in place, the (pluggable) locator will be
> able to make bulk changes to a ring at once. Main points:
> * Must be careful in handling consistency level during ring
>   transitions, such that a given node in the read set corresponds to a
>   specific node in the write set. This will impose some restrictions
>   on completion of transitions, to avoid code complexity, so it is an
>   important point.
> * All code outside of gossip (and any other locator that works
>   similarly) will be agnostic about individual changes to nodes, and
>   will instead only be notified when new ring states are available (in
>   aggregate). This makes the change non-trivial because all code that
>   currently is oriented around individual node changes always
>   producing a valid ring, will have to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3833) support arbitrary topology transitions

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206669#comment-13206669 ] 

Peter Schuller commented on CASSANDRA-3833:
-------------------------------------------

It should be noted though that the "we don't know who goes up first" is only a result of using a token ring as the underlying abstraction. One either has to add the logic to work around this when considering the token ring, or alternatively one can move away from the token ring as the underlying abstraction (which can in turn be considered mapping ring segments, but the token ring becomes more of an implementation detail than fundamental model then).

More on this in the future when this ticket is closer to reality.

                
> support arbitrary topology transitions 
> ---------------------------------------
>
>                 Key: CASSANDRA-3833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3833
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> Once we have the locator abstracted (with the gossiper being a
> particular concrete implementation), we want to change the locator
> abstraction to not express changes in ring topology on a per-node
> basis; rather we want to use an abstraction which communicates two
> arbitrary ring states; one state for the read set, and one for the
> write set.
> Once this abstraction is in place, the (pluggable) locator will be
> able to make bulk changes to a ring at once. Main points:
> * Must be careful in handling consistency level during ring
>   transitions, such that a given node in the read set corresponds to a
>   specific node in the write set. This will impose some restrictions
>   on completion of transitions, to avoid code complexity, so it is an
>   important point.
> * All code outside of gossip (and any other locator that works
>   similarly) will be agnostic about individual changes to nodes, and
>   will instead only be notified when new ring states are available (in
>   aggregate). This makes the change non-trivial because all code that
>   currently is oriented around individual node changes always
>   producing a valid ring, will have to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira