You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "DOAN DuyHai (JIRA)" <ji...@apache.org> on 2016/06/16 12:41:05 UTC

[jira] [Commented] (CASSANDRA-12015) Rebuilding from another DC should use different sources

    [ https://issues.apache.org/jira/browse/CASSANDRA-12015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333698#comment-15333698 ] 

DOAN DuyHai commented on CASSANDRA-12015:
-----------------------------------------

Here is the important code path

1) org.apache.cassandra.tools.nodetool.Rebuild::execute()
2) StorageService::rebuild(String sourceDc, String keyspace, String tokens)
3) RangeStreamer::getAllRangesWithSourcesFor(String keyspaceName, Collection<Range<Token>> desiredRanges)

Inside the last metho, we call the snitch to sort replicas :

  List<InetAddress> preferred = snitch.getSortedListByProximity(address, rangeAddresses.get(range));

If you're rebuilding nodes in new DC with "nodetool rebuild" command very fast, it may happen that one replica has better latency that the others so it will be picked up by DynamicSnitch 



> Rebuilding from another DC should use different sources
> -------------------------------------------------------
>
>                 Key: CASSANDRA-12015
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12015
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Fabien Rousseau
>
> Currently, when adding a new DC (ex: DC2) and rebuilding it from an existing DC (ex: DC1), only the closest replica is used as a "source of data".
> It works but is not optimal, because in case of an RF=3 and 3 nodes cluster, only one node in DC1 is streaming the data to DC2. 
> To build the new DC in a reasonable time, it would be better, in that case, to stream from multiple sources, thus distributing more evenly the load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)