You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Blake Eggleston (JIRA)" <ji...@apache.org> on 2014/09/29 01:15:34 UTC

[jira] [Comment Edited] (CASSANDRA-6246) EPaxos

    [ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151256#comment-14151256 ] 

Blake Eggleston edited comment on CASSANDRA-6246 at 9/28/14 11:15 PM:
----------------------------------------------------------------------

bq. In the current implementation, we only keep the last commit per CQL partition. We can do the same for this as well.

Yeah I've been thinking about that some more. Just because we could keep a bunch of historical data doesn't mean we should. There may be situations where we need to keep more than one instance around though, specifically when the instance is part of a strongly connected component. Keeping some historical data would be useful for helping nodes recover from short failures where they miss several instances, but after a point, transmitting all the activity for the last hour or two would just be nuts. The other issue with relying on historical data for failure recovery is that you can't keep all of it, so you'd have dangling pointers on the older instances. 

For longer partitions, and nodes joining the ring, if we transmitted our current dependency bookkeeping for the token ranges they're replicating, the corresponding instances, and the current values for those instances, that should be enough to get going.

bq. I am also reading about epaxos recently and want to know when do you do the condition check in your implementation?

It would have to be when the instance is executed.


was (Author: bdeggleston):
bq. In the current implementation, we only keep the last commit per CQL partition. We can do the same for this as well.

Yeah I've been thinking about that some more. Just because we could keep a bunch of historical data doesn't mean we should. There may be situations where we need to keep more than one instance around though, specifically when the instance is part of a strongly connected component. Keeping some historical data would be useful for helping instances recover from short failures where they miss several instances, but after a point, transmitting all the activity for the last hour or two would just be nuts. The other issue with relying on historical data for failure recovery is that you can't keep all of it, so you'd have dangling pointers on the older instances. 

For longer partitions, and nodes joining the ring, if we transmitted our current dependency bookkeeping for the token ranges they're replicating, the corresponding instances, and the current values for those instances, that should be enough to get going.

bq. I am also reading about epaxos recently and want to know when do you do the condition check in your implementation?

It would have to be when the instance is executed.

> EPaxos
> ------
>
>                 Key: CASSANDRA-6246
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Blake Eggleston
>            Priority: Minor
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)