You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/06/29 21:04:28 UTC
[jira] [Created] (CASSANDRA-2841) Always use even distribution for
merkle tree with RandomPartitionner
Always use even distribution for merkle tree with RandomPartitionner
--------------------------------------------------------------------
Key: CASSANDRA-2841
URL: https://issues.apache.org/jira/browse/CASSANDRA-2841
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.7.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
Fix For: 0.7.7, 0.8.2
Attachments: 2841.patch
When creating the initial merkle tree, repair tries to be (too) smart and use the key samples to "guide" the tree splitting. While this is a good idea for OPP where there is a good change the data distribution is uneven, you can't beat an even distribution for the RandomPartitionner. And a quick experiment even shows that the method used is significantly less efficient than an even distribution for the ranges of the merkle tree (that is, an even distribution gives a much better of distribution of the number of keys by range of the tree).
Thus let's switch to an even distribution for RandomPartitionner. That 3 lines change alone amounts for a significant improvement of repair's precision.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2841) Always use even distribution
for merkle tree with RandomPartitionner
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057458#comment-13057458 ]
Hudson commented on CASSANDRA-2841:
-----------------------------------
Integrated in Cassandra-0.7 #518 (See [https://builds.apache.org/job/Cassandra-0.7/518/])
Always use even distribution for merkle tree with RandomPartitionner
patch by slebresne; reviewed by jbellis for CASSANDRA-2841
slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1141217
Files :
* /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AntiEntropyService.java
* /cassandra/branches/cassandra-0.7/CHANGES.txt
> Always use even distribution for merkle tree with RandomPartitionner
> --------------------------------------------------------------------
>
> Key: CASSANDRA-2841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2841
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.7.0
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Trivial
> Labels: repair
> Fix For: 0.7.7, 0.8.2
>
> Attachments: 2841.patch
>
>
> When creating the initial merkle tree, repair tries to be (too) smart and use the key samples to "guide" the tree splitting. While this is a good idea for OPP where there is a good change the data distribution is uneven, you can't beat an even distribution for the RandomPartitionner. And a quick experiment even shows that the method used is significantly less efficient than an even distribution for the ranges of the merkle tree (that is, an even distribution gives a much better of distribution of the number of keys by range of the tree).
> Thus let's switch to an even distribution for RandomPartitionner. That 3 lines change alone amounts for a significant improvement of repair's precision.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2841) Always use even distribution
for merkle tree with RandomPartitionner
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057418#comment-13057418 ]
Jonathan Ellis commented on CASSANDRA-2841:
-------------------------------------------
+1
> Always use even distribution for merkle tree with RandomPartitionner
> --------------------------------------------------------------------
>
> Key: CASSANDRA-2841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2841
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.7.0
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Trivial
> Labels: repair
> Fix For: 0.7.7, 0.8.2
>
> Attachments: 2841.patch
>
>
> When creating the initial merkle tree, repair tries to be (too) smart and use the key samples to "guide" the tree splitting. While this is a good idea for OPP where there is a good change the data distribution is uneven, you can't beat an even distribution for the RandomPartitionner. And a quick experiment even shows that the method used is significantly less efficient than an even distribution for the ranges of the merkle tree (that is, an even distribution gives a much better of distribution of the number of keys by range of the tree).
> Thus let's switch to an even distribution for RandomPartitionner. That 3 lines change alone amounts for a significant improvement of repair's precision.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2841) Always use even distribution for
merkle tree with RandomPartitionner
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-2841:
----------------------------------------
Attachment: 2841.patch
Patch is against 0.7.
> Always use even distribution for merkle tree with RandomPartitionner
> --------------------------------------------------------------------
>
> Key: CASSANDRA-2841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2841
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.7.0
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Trivial
> Labels: repair
> Fix For: 0.7.7, 0.8.2
>
> Attachments: 2841.patch
>
>
> When creating the initial merkle tree, repair tries to be (too) smart and use the key samples to "guide" the tree splitting. While this is a good idea for OPP where there is a good change the data distribution is uneven, you can't beat an even distribution for the RandomPartitionner. And a quick experiment even shows that the method used is significantly less efficient than an even distribution for the ranges of the merkle tree (that is, an even distribution gives a much better of distribution of the number of keys by range of the tree).
> Thus let's switch to an even distribution for RandomPartitionner. That 3 lines change alone amounts for a significant improvement of repair's precision.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira