You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Peter Schuller (JIRA)" <ji...@apache.org> on 2011/04/07 19:24:05 UTC

[jira] [Created] (CASSANDRA-2434) node bootstrapping can violate consistency

node bootstrapping can violate consistency
------------------------------------------

                 Key: CASSANDRA-2434
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7.4
            Reporter: Peter Schuller


My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.

There are at least two issues I have with this behavior:

* If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.

* Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.

A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.

A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.

(A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)

Thoughts?



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094848#comment-13094848 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

bq. what about the case where the node leaving the replica set is dead

Good point.  We do need something to make that possible.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097703#comment-13097703 ] 

Zhu Han commented on CASSANDRA-2434:
------------------------------------

As peter suggested before, another approach to fix the consistency problem is streaming sstables from all alive peers if the "correct" node is down. And then leave them to normal compaction.  

This could be much lightweight than anti-entrophy repair, except the network IO pressure on the bootstrapping node.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108800#comment-13108800 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

Can you give an example for illustration?

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104718#comment-13104718 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

I think we're talking about different things. Requiring the user to have the right nodes available for operation X is not the same as "cassandra can 'lose' writes when it happens to stream from the wrong node, even if the user did everything right".

This ticket is about the latter, I think.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104701#comment-13104701 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

My comment wasn't very clear. Both decom and move currently, attempt to do the right thing. When a node is leaving, there should be one new replica for all the ranges it is responsible for. If it can't stream data to that replica there is a consistency problem.

Both operations currently try to do stream to that replica, but we should use the 'strict' logic in those cases as well and fail if we can't guarantee consistency and the user hasn't disabled strict.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

paul cannon updated CASSANDRA-2434:
-----------------------------------

    Attachment:     (was: 2434-2.patch.txt)

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112905#comment-13112905 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------


Ok, prospective approach to totally safe range movements:

Operational rules:
* Cassandra will not allow two range motion operations (move, bootstrap, decom) at the same time on the same node.
* When a range motion operation is already pending, User should refrain from starting another range motion operation (if either motion operation overlaps the arc-of-effect of the other) until the gossip info about the first change has propagated to all affected nodes. (This is more simply approximated by the "two minute rule".)
* Every point in the tokenspace has the same number of natural endpoints, and they're ordered the same from the perspective of all nodes (is this an ok assumption?).
* It is User's responsibility to make sure that the right streaming source nodes are available. If they're not, the range motion operation may fail.

Procedure:
* For any motion involving range _R_, there will be a stream from endpoint _EP_source_ to endpoint _EP_dest_. Given the same information about what range motion operations are pending (_TokenMetadata_) and the range _R_, there is a bijection from _EP_source_ to _EP_dest_, shared by all nodes in the ring.
* Procedure to determine _EP_source_ from _EP_dest_:
** Let _REP_current_ be the existing (ordered) list of natural endpoints for _R_.
** Let _TM_future_ be a clone of the current _TokenMetadata_, but with all ongoing bootstraps, moves, and decoms resolved and completed.
** Let _REP_future_ be the list of (ordered) natural endpoints for _R_ according to _TM_future_.
** Let _EPL_entering_ be the list of endpoints in _REP_future_ which are not in _REP_current_ (preserving their order in _REP_future_).
** Let _EPL_leaving_ be the list of endpoints in _REP_current_ which are not in _REP_future_ (preserving their order in _REP_current_).
** _EPL_entering_ and _EPL_leaving_ are of the same length.
** Let _Pos_ be the position/index of _EP_dest_ in _EPL_entering_.
** Let _EP_source_ be the endpoint at position _Pos_ in _EPL_leaving_.
* Intuitively, this is the same as the rule expressed earlier in this ticket (stream from the node you'll replace), but also handles other ongoing range movements in the same token arc.
* These rules can be pretty trivially inverted to determine _EP_dest_ from _EP_source_.
* When any node gets gossip about a range motion occurring with its token arc-of-effect, it calculates (or recalculates) the streams in which it should be involved. Any ongoing streams which are no longer necessary are canceled, and any newly necessary streams are instigated.

I tried to construct a ruleset without that last rearrange-ongoing-streams rule, but it ended up with a pretty complicated set of extra restrictions, and a more complicated set of procedures than this.

This set of rules might look complicated, but I think it should be fairly straightforward to implement, and may even end up simpler overall than our current code.

Note that this procedure even maintains the consistency guarantee in cases like:

* In an RF=3 cluster with nodes A, E, and F, bootstrap B, C, and D in quick succession (E streams to B, F streams to C, A streams to D)
* In an RF=3 cluster with nodes A, C, and E, bootstrap B, D, and F, and decommission A, C, and E, all in quick succession (A streams to B, C streams to D, E streams to F)
* In an RF=3 cluster with nodes A, B, C, D, and E, decommission B and C in quick succession (B streams to D, C streams to E)

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2434:
-----------------------------------------

    Assignee: paul cannon

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096352#comment-13096352 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Ok, so if we always prefer to bootstrap from the correct token, then I still think we should combine getRangesWithStrictSource and getRangesWithSources. Basically the logic should be, find the 'best' node to stream from. If the user requested it, also find a list of other candidates and order them by proximity. Right?

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109749#comment-13109749 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

So the fact that the 'correct' bootstrap source switches mid stream isn't really the problem I don't think. We set up pending ranges so that when a node gets ready to bootstrap, writes start getting duplicated to both the currently correct replica, and the bootstrapping node. Since all new writes are duplicated we can stream from that node and as long as we get the entire dataset, consistency should be fine. The problem is there is nothing in place preventing someone from running cleanup or killing a decommed node or something.

I'm doubtful that the complexity of a correct set of rules for allowing overlapping ring changes is really worth the time/effort/fragility. It doesn't seem like that much of a loss to me to disallow them. Perhaps your set of rules will be super simple though :).

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094853#comment-13094853 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

As long as we need to handle the dead case I don't see any harm in having a slightly more generally-useful "use closest" option instead of "force to pick random live replica" option.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097695#comment-13097695 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. The suggestion was that if the 'correct' node is down, you can force the bootstrap to complete anyway (probably from the closest node, but that is transparent to the user), but only if the 'correct' node is down.

Oh, ok. I misunderstood. This seems reasonable. I'd lean for the more general solution, yeah, but I don't feel very strongly about it.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152136#comment-13152136 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

How much work would it be to add "just one more bandaid" for the stream source thing, in comparison?
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.4
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094845#comment-13094845 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

Judging by the irc channel and user list, assuming people will follow best practices seems a bit of a dead end. Plus, what about the case where the node leaving the replica set is dead? You still want the option to allow choosing another to stream from. And we probably shouldn't default to choosing another without explicit permission, because of the consistency violation stuff.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097625#comment-13097625 ] 

Zhu Han commented on CASSANDRA-2434:
------------------------------------

bq. Also if only one node is down you should still be able to read/write at quorum and achieve consistency

I suppose quorum read plus quorum write should provide monotonic read consistency. [1] Supposing  quorum write on key1 hits node A and node B, not on node C due to temporal network partition. After that node B is replaced by node D since it is down, and node D streams data from node C. If the following quorum read on key1 hits only node C and node D, the monotonic consistency is violated. This is rare but not unrealistic, especially when hint handoff is disabled. 

Maybe it is more resonable to give the admin an option, to specify that the bootstrapped node should not accept any read request until the admin turn it on manually. So the admin can start a manual repair if he wants to assure everything goes fine.

[1]http://www.allthingsdistributed.com/2007/12/eventually_consistent.html

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

paul cannon updated CASSANDRA-2434:
-----------------------------------

    Attachment:     (was: 2434.patch.txt)

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434-2.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

paul cannon updated CASSANDRA-2434:
-----------------------------------

    Summary: range movements can violate consistency  (was: node bootstrapping can violate consistency)

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094850#comment-13094850 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Yeah I was assuming ONTS is basically deprecated at this point. Didn't think about the dead case though. I suppose just a 'force' type of option and a warning indicating the possible consistency issues works.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203260#comment-13203260 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

So, I believe that the rules outlined above can still work without the "wait X seconds between bootstrap operations" prereq, if a pretty simple extra step is added:

If any node learns about conflicting move operations, then some rules are applied to choose which will be honored and which will return an error to its caller (if still possible).

Those rules are:
* A decom for node X beats a move or bootstrap for node X
* Two decoms for node X from coordinator nodes Y and Z: the coordinator with the higher token wins
* Any other conflicts between move/bootstrap operations for the same node (which can arise in certain partition situations) are easily resolved by latest VersionedValue.

This should guarantee convergence of TokenMetadata across any affected parts of a cluster.
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096990#comment-13096990 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

Repair is a much, much more heavyweight solution to the problem than just "stream from the node that is 'displaced.'"

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097275#comment-13097275 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

bq. I'm saying we can't just automatically choose another token if the user didn't specifically say it's ok.

Oh, ok.  Right.  (I thought we were just bikeshedding over whether to call the "manual override" option "use closest" or "force bootstrap.")

bq. Repair doesn't help in the case when you lost data due to a node going down

Additionally, I don't like the idea of automatically doing expensive things like repair; it feels cleaner to not do it automatically, and allow using the existing tool to perform one if desired, than to do it by default and have to add an option to skip it for when that's not desirable.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2434:
--------------------------------------

    Assignee:     (was: David Alves)
    
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>             Fix For: 1.3
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028225#comment-13028225 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

Related: CASSANDRA-833

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>             Fix For: 1.0
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

paul cannon updated CASSANDRA-2434:
-----------------------------------

    Attachment: 2434-3.patch.txt

2434-3.patch.txt removes the bits that add the "-n" option to nodetool join. Apparently no "nodetool join" should ever result in a bootstrap, so it doesn't matter whether the caller wants "strict" or not.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094864#comment-13094864 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Well, I imagine the 'force' option would pick the nearest live node. By 'force' I mean the option should be posed to the user as "We can't guarantee consistency in your cluster after this bootstrap since a node is down, if you would like to do this anyway, specify option X". Just saying you can either bootstrap or bootstrap from the closest node doesn't convey the implications as well I don't think.

Maybe we are on the same page and arguing over wording though.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109763#comment-13109763 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

Right, I know how the pending-ranges writes work. But there are still possible openings for consistency violations for previously-written data, similar to the one outlined in the original ticket description. If the bootstrap stream source has an outdated value and it gets duplicated to a bootstrapping node, but then the bootstrap stream source doesn't leave the replication set (because something else changed in the interim), the outdated value is now on more nodes than it used to be- possibly now QUORUM, when previously it was safely below.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108617#comment-13108617 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

bq. It's always been unsupported to bootstrap a second node into the same "token arc" while a previous one is ongoing.

I'm pretty sure now that this is incorrect; we fixed it back in CASSANDRA-603.  I'm updating the comments in TokenMetadata as follows:

{noformat}
    // Prior to CASSANDRA-603, we just had <tt>Map<Range, InetAddress> pendingRanges<tt>,
    // which was added to when a node began bootstrap and removed from when it finished.
    //
    // This is inadequate when multiple changes are allowed simultaneously.  For example,
    // suppose that there is a ring of nodes A, C and E, with replication factor 3.
    // Node D bootstraps between C and E, so its pending ranges will be E-A, A-C and C-D.
    // Now suppose node B bootstraps between A and C at the same time. Its pending ranges
    // would be C-E, E-A and A-B. Now both nodes need to be assigned pending range E-A,
    // which we would be unable to represent with the old Map.  The same thing happens
    // even more obviously for any nodes that boot simultaneously between same two nodes.
    //
    // So, we made two changes:
    //
    // First, we changed pendingRanges to a <tt>Multimap<Range, InetAddress></tt> (now
    // <tt>Map<String, Multimap<Range, InetAddress>></tt>, because replication strategy
    // and options are per-KeySpace).
    //
    // Second, we added the bootstrapTokens and leavingEndpoints collections, so we can
    // rebuild pendingRanges from the complete information of what is going on, when
    // additional changes are made mid-operation.
    //
    // Finally, note that recording the tokens of joining nodes in bootstrapTokens also
    // means we can detect and reject the addition of multiple nodes at the same token
    // before one becomes part of the ring.
    private BiMap<Token, InetAddress> bootstrapTokens = HashBiMap.create();
    // (don't need to record Token here since it's still part of tokenToEndpointMap until it's done leaving)
    private Set<InetAddress> leavingEndpoints = new HashSet<InetAddress>();
    // this is a cache of the calculation from {tokenToEndpointMap, bootstrapTokens, leavingEndpoints}
    private ConcurrentMap<String, Multimap<Range, InetAddress>> pendingRanges = new ConcurrentHashMap<String, Multimap<Range, InetAddress>>();
{noformat}


> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104709#comment-13104709 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

If decom can't stream data to the appropriate replica, then it should just fail, right? Do we support decom in cases where a consistency violation would result? Seems like it has to be the user's responsibility to bring up or decom the other node first.

move could introduce a violation when it gains a new range, though, in the same cases as the bootstrap issue explained above.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096257#comment-13096257 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

I did that (passing strict to getWorkMap) at first, but it wasn't too clean since it required adding a 'table' argument as well as 'strict', and it ended up replacing too much of the getRangesWithSources functionality. So then I added 'strict' as a parameter to getRangesWithSources (actually, it looks like I neglected to update its javadoc comment), but the differences between getRangesWithSources and getRangesWithStrictSource are such that a combined method feels a lot more special-casey and clunky. I like this way best in the end.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151640#comment-13151640 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

No longer very optimistic on the "may even end up simpler overall than our current code" front?

TBH this area of the code is fragile and hairy and maybe starting from a clean slate with a real plan instead of trying to patch things in haphazardly would be a good thing.

But, I'd be okay with re-imposing the "no overlapping moves" rule and fixing the stream source problem if that's going to be substantially simpler.
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.4
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096138#comment-13096138 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Just as initial feedback, I'm not sure we need a new getRangesWithSources method, especially with so much duplication between them. Seems like strict could be passed to the current method. Also, what about leaving getRangesWithSources how it is and passing strict to getWorkMap? That method can do the endpoint set math if it needs to and throw a more informative exception in the case that strict is set and the endpoint we want to fetch from is dead.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097189#comment-13097189 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. Ok, so if we always prefer to bootstrap from the correct token, then I still think we should combine getRangesWithStrictSource and getRangesWithSources. Basically the logic should be, find the 'best' node to stream from. If the user requested it, also find a list of other candidates and order them by proximity. Right?

I don't think so. I would still want to leave the option to stream from the closest even if the strict best node is available.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266743#comment-13266743 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

As a half measure, we can stream from the "right" node very easily if we continue to make the simplifying assumption that no other node movement happens in overlapping ranges during the operation.
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.2
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097204#comment-13097204 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Paul,

The suggestion was that if the 'correct' node is down, you can force the bootstrap to complete anyway (probably from the closest node, but that is transparent to the user), but only if the 'correct' node is down. It sounds like you agree with Jonathan on the more general approach though.

Zhu,

Repair doesn't help in the case when you lost data due to a node going down. Also if only one node is down you should still be able to read/write at quorum and achieve consistency (assuming your replication factor is greater than 2).

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104573#comment-13104573 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

Do we need to do anything special for move/decommission as well?

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152140#comment-13152140 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

Do you mean to implement the new "safe range movements" procedure outlined above, without rewriting the rest of the range movement code?

If so, I submit a SWAG of "1/3-1/2 the cost of the rewrite option". The bandaid would still touch a lot of different moving parts.
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.4
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109742#comment-13109742 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

I think we can still allow overlapping concurrent ring changes, with the right set of invariants and/or operational rules. I've been trying to work on defining those. It's pretty tricky but I think I have the right way of approaching the model now. Will update later today with more.

Nick is right though, c* probably shouldn't support overlapping changes as is. Think bootstrapping >N nodes between the same two old nodes, decom'ing one node and bootstrapping another in such a way that the bootstrap source stream node becomes the wrong source, etc.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204919#comment-13204919 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

Sounds reasonable.
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.2
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109230#comment-13109230 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Ok, so I think there are really two consistency issues here. Firstly, picking the 'right' node to stream data to/from when making changes to the ring. Secondly, disallowing concurrent changes that have overlapping ranges.

Currently we only disallow nodes from moving/decommissioning when they may potentially have data being streamed to them. There are a few examples of things we currently allow which I think are generally a bad idea.

1) Say you have nodes A and D, if you bootstrap nodes B and C at the same time in between A and D, it may turn out that the correct node to stream from for both nodes is D. Now say node C finishes bootstrapping before node B. At that point, the correct node for B to bootstrap from is technically C, although D still has the data. However, since D is no longer technically responsible for the data, the user could run cleanup on D and delete the data that B is attempting to stream.

2) The above case is also a problem when you bootstrap a node and the node it decides it needs to stream from is moving. Once that node finishes moving you could run cleanup on that node and delete data that the bootstrapping node needs. In this case, all documentation indicates you should do a cleanup after a move in order to remove old data, so it seems possibly more likely.

3) A variation of the above case is when you bootstrap a node and the node it streams from is leaving. In that case the decom may finish and the user could terminate the cassandra process and/or node breaking any streams. Not to mention the idea of a node in a decommissioned state continuing to stream seems like a bad idea. I believe it would work currently, but I'm not sure and it seems likely to break.

I can't really think of any other examples but I think thats enough to illustrate that overlapping concurrent ring changes are a bad idea and we should just attempt to prevent them in all cases. An argument could be made that this would prevent you from doubling your cluster (the best way to grow) all at once, but I don't think that's really a huge deal. At most you would need RF steps to double your cluster.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

paul cannon updated CASSANDRA-2434:
-----------------------------------

    Attachment: 2434.patch.txt

This still needs some testing, but I'm putting it up now in case anyone has some time to take a look and make sure my approach is sane.

Adds an optional "-n" argument to "nodetool join" to allow bootstrapping from closest live node (n == non-strict). Also recognizes an optional property "cassandra.join_strict" which can be set to false when a bootstrap is triggered by cassandra.join_ring.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094818#comment-13094818 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Seems to me like the option to stream from the closest replica might just add more confusion without really gaining anything. The node that is leaving the replica set will never be in another datacenter. It could be on a different rack, but if you are following best practices and alternating racks then it is likely either on the same rack or there is only one copy on that rack and the best case possible is streaming from another rack anyway.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104618#comment-13104618 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

I had a note to remember to create a ticket for that, but if we want to do it here that works as well.

In any case, yes the same concerns exist when giving away ranges as when gaining ranges.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091881#comment-13091881 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. It's always been unsupported to bootstrap a second node into the same "token arc" while a previous one is ongoing. Does that cover what you're thinking of or are we still on the hook?

Is it also unsupported to decom within X's token arc, or move into/out of that arc, while X is bootstrapping? I think we're safe if so.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096274#comment-13096274 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Well I guess it kind of depends on which approach we take as well. Is the option A) bootstrap from the right token or bootstrap from the closest token, or B) bootstrap from the right token, but if that one isn't up, bootstrap from any other token preferring the closer ones.

Like I said, I'd say B, but if you and Jonathan both disagree.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2434:
--------------------------------------

    Affects Version/s:     (was: 0.7.4)
        Fix Version/s: 1.0

ISTM the easiest fix is to always stream from the node that will be removed from the replicas for each range, unless given permission from the operator to choose a replica that is closer / less dead.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>             Fix For: 1.0
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091884#comment-13091884 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

Yes.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097106#comment-13097106 ] 

Zhu Han commented on CASSANDRA-2434:
------------------------------------

Sometimes, the node is replaced because the hardware is crashed. If so, "streaming from the node being replaced" is not available.

How about force the repair happens if the user specifies he needs the consistency of quorum while the original node has gone.


> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104715#comment-13104715 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

If we support it for bootstrapping I don't see why we shouldn't support it for decom. Right, move has the problem in both cases (giving away ranges, gaining ranges).

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104690#comment-13104690 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. In any case, yes the same concerns exist when giving away ranges as when gaining ranges.

Oh? I must be missing something. What would a consistency violation failure scenario look like for giving away ranges?

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158701#comment-13158701 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

So either way, a substantial amount of work, but the bandaid still leaves us with rules that the operator must enforce or hit subtle problems.

My hang-up with the bandaid is the part where C* can't effectively enforce the guidelines to be safe.  Even "wait X seconds between bootstrap operations" is not a prereq I am comfortable with.

Unless the above is incorrect, I think we should bite the bullet and fix it "right."
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.5
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096981#comment-13096981 ] 

Zhu Han commented on CASSANDRA-2434:
------------------------------------

Is it possible to make the node does not reply to any request before bootstrap and anti-entrophy repair is finished?

This could fix the consistency problem brought by bootstrap.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091418#comment-13091418 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

So, it looks like it will be possible for the node-that-will-be-removed to change between starting a bootstrap and finishing it (other nodes being bootstrapped/moved/decom'd during that time period); in some cases, that could still lead to a consistency violation.  Is that unlikely enough that we don't care, here?  At least the situation would be better with the proposed fix than it is now.

Second question: what might the "permission from the operator to choose a replica that is closer/less dead" look like?  Maybe just a boolean flag saying "it's ok to stream from any node for any range you need to stream"?  Or would we want to allow specifying precise source nodes for any/all affected address ranges?

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112960#comment-13112960 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

Clarification: the cleanup operation would need to consider pending ranges in addition to the current natural range-endpoint mapping.

It would be possible with this proposal for a node _M_ to be streaming range _S_ to a bootstrapping node _Y_, and midway, for _M_ to stop being part of the replication set for _S_ (perhaps some other bootstraps nearby completed first). This should be ok for consistency, but a cleanup operation on _M_ while the stream is ongoing could potentially remove all the data in _S_ unless this change is made.

Further clarification: instead of the W+1 special case we now have for writing to a range _T_ with a pending motion, we would need to write to W+C replicas instead, where C is the number of pending motions within the replication set of _T_.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104694#comment-13104694 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

bq. Expand the scope of this ticket accordingly?

Yes, let's solve them both here.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Bailey updated CASSANDRA-2434:
-----------------------------------

    Reviewer: nickmbailey

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152123#comment-13152123 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

No, I do think that if we tore out the existing code and replaced it, it would be simpler overall, but (a) that would probably also be true if we rewrote the existing code without implementing this; (b) it will be rather a lot of work; and (c) it may engender a whole new generation of subtle corner-case bugs (or maybe it will eliminate a lot of such bugs that already exist).
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.4
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

paul cannon updated CASSANDRA-2434:
-----------------------------------

    Attachment: 2434-2.patch.txt

updated patch fixes the docstring for getRangesWithStrictSource().

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434-2.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096807#comment-13096807 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

I'm okay with either A or B.

bq. I would say we simply can't do anything that might violate the consistency guarantee without explicit permission from the user

I'm not sure I understand, are you saying that B would violate this, or just that the status quo does?

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096349#comment-13096349 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

Yeah. B is probably easier on everyone, but I would say we simply can't do anything that might violate the consistency guarantee without explicit permission from the user.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151620#comment-13151620 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. It sounds like you have a reasonable solution here, is there any reason not to implement it for 1.1?

Just that it's quite a bit more complex than simply disallowing overlapping ring movements, and the extra problems that come with higher complexity. I think this feature is worth it, on its own, but when i think of how much pain Brandon seems to be going through dealing with streaming code, maybe it's not.
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.4
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108793#comment-13108793 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. I'm pretty sure now that this is incorrect;

Well, doh. That puts us back at my first question:

bq. So, it looks like it will be possible for the node-that-will-be-removed to change between starting a bootstrap and finishing it (other nodes being bootstrapped/moved/decom'd during that time period); in some cases, that could still lead to a consistency violation. Is that unlikely enough that we don't care, here? At least the situation would be better with the proposed fix than it is now.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151570#comment-13151570 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

Coming back to this after the 1.0 scramble.

It sounds like you have a reasonable solution here, is there any reason not to implement it for 1.1?
                
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.4
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091851#comment-13091851 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

bq. it looks like it will be possible for the node-that-will-be-removed to change between starting a bootstrap and finishing it

It's always been unsupported to bootstrap a second node into the same "token arc" while a previous one is ongoing.  Does that cover what you're thinking of or are we still on the hook?

bq. what might the "permission from the operator to choose a replica that is closer/less dead" look like?

It seems to me that the two valid choices are

- Stream from "correct" replica
- Stream from closest replica

I can't think of a reason to stream from an arbitrary replica other than those options.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2434:
--------------------------------------

    Fix Version/s:     (was: 1.0.7)
                   1.1
    
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094846#comment-13094846 ] 

Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

bq. The node that is leaving the replica set will never be in another datacenter

I think that's only strictly true for NTS, but I'm fine leaving it out.  Not worth adding complexity for ONTS at this point.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097190#comment-13097190 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. I'm not sure I understand, are you saying that B would violate this, or just that the status quo does?

I'm saying B would violate this, yes. B was "bootstrap from the right token, but if that one isn't up, bootstrap from any other token preferring the closer ones", right? I'm saying we can't just automatically choose another token if the user didn't specifically say it's ok.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104685#comment-13104685 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

bq. Do we need to do anything special for move/decommission as well?

Yes, it looks like we do need to add similar logic for move. Expand the scope of this ticket accordingly?

I don't see any way decommission could be affected by this sort of problem.

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-2434) range movements can violate consistency

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2434:
-----------------------------------------

    Assignee: David Alves  (was: paul cannon)
    
> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104756#comment-13104756 ] 

paul cannon commented on CASSANDRA-2434:
----------------------------------------

Conversation on #cassandra-dev resulted in the conclusion that we'll fix this bug for range acquisition (bootstrap and move) now, and plan to allow the same looseness (non-strict mode, or whatever) for range egress (move and decom) in the future.

I think.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2434) node bootstrapping can violate consistency

Posted by "paul cannon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

paul cannon updated CASSANDRA-2434:
-----------------------------------

    Attachment: 2434-testery.patch.txt

Patch 2434-testery.patch.txt adds a bit to unit tests to exercise o.a.c.dht.BootStrapper.getRangesWithStrictSource().

> node bootstrapping can violate consistency
> ------------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.1
>
>         Attachments: 2434-2.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira