You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2016/06/14 18:34:27 UTC

[jira] [Comment Edited] (CASSANDRA-11933) Cache local ranges when calculating repair neighbors

    [ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330110#comment-15330110 ] 

Paulo Motta edited comment on CASSANDRA-11933 at 6/14/16 6:33 PM:
------------------------------------------------------------------

Tests look good (quite a few flakey tests, but they look unrelated). Marking as ready to commit. Thanks!

Commit info: minor conflicts until 3.0, which merges cleanly to trunk.

Edit: I tried this with ccm on a 7,500 tokens 5-node cluster, and repair time was reduced from 1,40min to 50s.


was (Author: pauloricardomg):
Tests look good (quite a few flakey tests, but they look unrelated). Marking as ready to commit. Thanks!

Commit info: minor conflicts until 3.0, which merges cleanly to trunk.

> Cache local ranges when calculating repair neighbors
> ----------------------------------------------------
>
>                 Key: CASSANDRA-11933
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11933
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Cyril Scetbon
>            Assignee: Mahdi Mohammadi
>
> During  a full repair on a ~ 60 nodes cluster, I've been able to see that this stage can be significant (up to 60 percent of the whole time) :
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
> It's merely caused by the fact that https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes more than 99% of the time. This call takes 600ms when there is no load on the cluster and more if there is. So for 10k ranges, you can imagine that it takes at least 1.5 hours just to compute ranges. 
> Underneath it calls [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] which can get pretty inefficient ([~jbellis]'s [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165])
> *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend hours on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)