You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/09/03 11:41:51 UTC

[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation

    [ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756479#comment-13756479 ] 

Sylvain Lebresne commented on CASSANDRA-3200:
---------------------------------------------

Now that I think about it, "last time I checked seriously" I intended to do this at the hash level, which would be ideal. I.e., to compare all leafs together so you can say things like: A and B agree on this sub-range but C doesn't, while on that other sub-range it's C and B that agree but not A.

A simpler solution could be to do it at the tree level. That is, just do a first path identifying which nodes fully agree on their tree, and when that's the case we could cut on the number of streaming job to do. Namely, if A and B are fully in sync, then there is no point in doing both A->C and B->C. This might save a bunch of transport already for a relatively reasonable effort, though I suspect that on heavily updated CF, it will be rare for two trees to fully agree due to timing issues.
                
> Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3200
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3200
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: repair
>
> Currently, repair compare merkle trees by pair, in isolation of any other tree. What that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync, but C having some range r inconsitent with both A and B (since those are consistent), we will do the following transfer of r: A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know which one is more to date from A or C. However, the transfer B -> C is useless provided we do A -> C if A and B are in sync. Not doing that transfer will be a 25% improvement in that case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement, etc...
> Given that this situation of one node not in sync while the others are is probably fairly common (one node died so it is behind), this could be a fair improvement over what is transferred. In the case where we use repair to rebuild completely a node, this will be a dramatic improvement, because it will avoid the rebuilded node to get RF times the data it should get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira