You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Geoffrey Yu (JIRA)" <ji...@apache.org> on 2016/08/10 01:51:20 UTC
[jira] [Comment Edited] (CASSANDRA-9876) One way targeted repair

    [ https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414586#comment-15414586 ] 

Geoffrey Yu edited comment on CASSANDRA-9876 at 8/10/16 1:51 AM:
-----------------------------------------------------------------

Thanks for the quick review! I’ve attached a new patch that addresses your comments, with the exception of one of them for which I wanted to get some more feedback first.

I also attached a patch that adds one dtest to test the pull repair. It works nearly identically to the token range repair with the exception that it asserts that one of the nodes only sends data and the other only receives.

{quote}
I don't think it's necessary to make specifying --start-token and --end-token mandatory, since if that is not specified it will just pull repair all common ranges between specified hosts.
{quote}

The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug.

To replicate the issue, just create a 3 node cluster, add a keyspace with replication factor 2, and run a regular repair through nodetool on that keyspace with exactly two nodes specified.

The reason it happens is that if no ranges are specified, the repair will [add all ranges on the local node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137]. Then when we hit {{RepairRunnable}}, we try to [find a list of neighbors for each range|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162].

The problem here is that it isn’t always true that every range the local node owns is also owned by the remote node we specified through the nodetool command. Because of this the [check here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251] may result in an exception being thrown, which aborts the repair.

If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. What do you think?


was (Author: geoffxy):
Thanks for the quick review! I’ve attached a new patch that addresses your comments, with the exception of one of them for which I wanted to get some more feedback first.

I also attached a patch that adds one dtest to test the pull repair. It works nearly identically to the token range repair with the exception that it asserts that one of the nodes only sends data and the other only receives.

{quote}
I don't think it's necessary to make specifying --start-token and --end-token mandatory, since if that is not specified it will just pull repair all common ranges between specified hosts.
{quote}

The reason why I added in the check for a token range was that the repair code as it is now doesn’t actually add only the common ranges between the specified hosts. I wasn’t sure if this is was the intended behavior or a bug.

To replicate the issue, just create a 3 node cluster, add a keyspace with replication factor 2, and run a regular repair through nodetool on that keyspace with exactly two nodes specified.

The reason it happens is that if no ranges are specified, the repair will [add all ranges on the local node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137]. Then when we hit {{RepairRunnable}}, we try to [find a list of neighbors for each range|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162].

The problem here is that it isn’t always true that every range the local node owns is also owned by the remote node we specified through the nodetool command. In the example above, only one range will be common between any two nodes. Because of this the [check here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251] may result in an exception being thrown, which aborts the repair.

If this is intended behavior, then forcing the user to specify a token range that is common between the nodes prevents that exception from being thrown. Otherwise the error message, “Repair requires at least two endpoints that are neighbours before it can continue” can be confusing to the operator since the two specified nodes may actually share a common range. What do you think?

> One way targeted repair
> -----------------------
>
>                 Key: CASSANDRA-9876
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9876
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Geoffrey Yu
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt
>
>
> Many applications use C* by writing to one local DC. The other DC is used when the local DC is unavailable. When the local DC becomes available, we want to run a targeted repair b/w one endpoint from each DC to minimize the data transfer over WAN.  In this case, it will be helpful to do a one way repair in which data will only be streamed from other DC to local DC instead of streaming the data both ways. This will further minimize the traffic over WAN. This feature should only be supported if a targeted repair is run involving 2 hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)