You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/04/22 13:38:05 UTC
[jira] [Created] (CASSANDRA-2541) Improve the precision of the
repair merkle trees
Improve the precision of the repair merkle trees
------------------------------------------------
Key: CASSANDRA-2541
URL: https://issues.apache.org/jira/browse/CASSANDRA-2541
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
Fix For: 0.8.1
Repair uses the sstable sampled keys to split the merkle tree. This means the 'precision' of the tree will be index_interval (so 128 by default). This is probably fine when you have lots of skinny rows. But when you have less fat rows, this is probably unnecessary imprecise.
Added to that the fact that each node will not have the same set of samples, you may not always end up using the more precise range in the trees when computing differences, which could make the imprecision worst (to be fair, it is quite possible this happens very rarely).
Anyway, this ticket proposes to add an additional 'split_factor' (can be fixed, can be configurable (by the user or based on metrics on how fat the rows are)) that makes use re-split 'split_factor' times each ranges after the initial sample-based split of the tree.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2541) Improve the precision of the
repair merkle trees
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne resolved CASSANDRA-2541.
-----------------------------------------
Resolution: Invalid
Well I was actually wrong in that the splitting reuse the sample as long as it doesn't have a complete tree (complete in the sens of depth or size greater that there fixed limits). So I think there is no particular problem here.
There is a small bug in 0.8 code that can make the splitting process exit early, but I'll open another ticket for that.
> Improve the precision of the repair merkle trees
> ------------------------------------------------
>
> Key: CASSANDRA-2541
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2541
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Labels: repair
> Fix For: 0.8.1
>
> Original Estimate: 8h
> Remaining Estimate: 8h
>
> Repair uses the sstable sampled keys to split the merkle tree. This means the 'precision' of the tree will be index_interval (so 128 by default). This is probably fine when you have lots of skinny rows. But when you have less fat rows, this is probably unnecessary imprecise.
> Added to that the fact that each node will not have the same set of samples, you may not always end up using the more precise range in the trees when computing differences, which could make the imprecision worst (to be fair, it is quite possible this happens very rarely).
> Anyway, this ticket proposes to add an additional 'split_factor' (can be fixed, can be configurable (by the user or based on metrics on how fat the rows are)) that makes use re-split 'split_factor' times each ranges after the initial sample-based split of the tree.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira