You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Podkowinski (JIRA)" <ji...@apache.org> on 2016/12/21 12:34:58 UTC

[jira] [Comment Edited] (CASSANDRA-13052) Repair process is violating the start/end token limits for small ranges

    [ https://issues.apache.org/jira/browse/CASSANDRA-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766949#comment-15766949 ] 

Stefan Podkowinski edited comment on CASSANDRA-13052 at 12/21/16 12:34 PM:
---------------------------------------------------------------------------

My [WIP branch|https://github.com/apache/cassandra/compare/cassandra-2.1...spodkowinski:WIP-13052] has now been updated with an exit condition that should avoid running into the situation to have the midpoint method called with an invalid range.

The current MT code recursively looks for the node closest to the leafs by continually dividing the range by it's midpoint and returning at a point when any child is not contained anymore in the range we're looking for. Unfortunately this condition will not apply when we have a leaf in the MT that exactly spans the range of a single token we're searching. See [MerkleTree.findHelper|https://github.com/apache/cassandra/blob/4a2464192e9e69457f5a5ecf26c094f9298bf069/src/java/org/apache/cassandra/utils/MerkleTree.java#L420].

I still want to investigate how likely this going to happen for larger ranges, e.g. complete vnodes.


was (Author: spodxx@gmail.com):
My [WIP branch|https://github.com/apache/cassandra/compare/cassandra-2.1...spodkowinski:WIP-13052] has now been updated with an exit condition that should avoid running into the situation to have the midpoint method called with an invalid range.

The current MT code recursively looks for the node closest to the leafs by continually dividing the range by it's midpoint and returning at a point when any child is not contained anymore in the range we're looking for. Unfortunately this condition will not apply when we have a child in the MT that exactly spans the range of a single token we're searching. See [MerkleTree.findHelper|https://github.com/apache/cassandra/blob/4a2464192e9e69457f5a5ecf26c094f9298bf069/src/java/org/apache/cassandra/utils/MerkleTree.java#L420].

I still want to investigate how likely this going to happen for larger ranges, e.g. complete vnodes.

> Repair process is violating the start/end token limits for small ranges
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-13052
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13052
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: We tried this in 2.0.14 and 3.9, same bug.
>            Reporter: Cristian P
>            Assignee: Stefan Podkowinski
>         Attachments: ccm_reproduce-13052.txt, system-dev-debug-13052.log
>
>
> We tried to do a single token repair by providing 2 consecutive token values for a large column family. We soon notice heavy streaming and according to the logs the number of ranges streamed was in thousands.
> After investigation we found a bug in the two partitioner classes we use (RandomPartitioner and Murmur3Partitioner).
> The midpoint method used by MerkleTree.differenceHelper method to find ranges with differences for streaming returns abnormal values (way out of the initial range requested for repair) if the repair requested range is small (I expect smaller than 2^15).
> Here is the simple code to reproduce the bug for Murmur3Partitioner:
> Token left = new Murmur3Partitioner.LongToken(123456789L);
> Token right = new Murmur3Partitioner.LongToken(123456789L);
> IPartitioner partitioner = new Murmur3Partitioner();
> Token midpoint = partitioner.midpoint(left, right);
> System.out.println("Murmur3: [ " + left.getToken() + " : " + midpoint.getToken() + " : " + right.getToken() + " ]");
> The output is:
> Murmur3: [ 123456789 : -9223372036731319019 : 123456789 ]
> Note that the midpoint token is nowhere near the suggested repair range. This will happen if during the parsing of the tree (in MerkleTree.differenceHelper) in search for differences  there isn't enough tokens for the split and the subrange becomes 0 (left.token=right.token) as in the above test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)