You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/04/22 13:38:05 UTC

[jira] [Created] (CASSANDRA-2541) Improve the precision of the repair merkle trees

Improve the precision of the repair merkle trees
------------------------------------------------

                 Key: CASSANDRA-2541
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2541
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.6
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Minor
             Fix For: 0.8.1


Repair uses the sstable sampled keys to split the merkle tree. This means the 'precision' of the tree will be index_interval (so 128 by default). This is probably fine when you have lots of skinny rows. But when you have less fat rows, this is probably unnecessary imprecise.

Added to that the fact that each node will not have the same set of samples, you may not always end up using the more precise range in the trees when computing differences, which could make the imprecision worst (to be fair, it is quite possible this happens very rarely).

Anyway, this ticket proposes to add an additional 'split_factor' (can be fixed, can be configurable (by the user or based on metrics on how fat the rows are)) that makes use re-split 'split_factor' times each ranges after the initial sample-based split of the tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2541) Improve the precision of the repair merkle trees

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-2541.
-----------------------------------------

    Resolution: Invalid

Well I was actually wrong in that the splitting reuse the sample as long as it doesn't have a complete tree (complete in the sens of depth or size greater that there fixed limits). So I think there is no particular problem here.

There is a small bug in 0.8 code that can make the splitting process exit early, but I'll open another ticket for that.

> Improve the precision of the repair merkle trees
> ------------------------------------------------
>
>                 Key: CASSANDRA-2541
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2541
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: repair
>             Fix For: 0.8.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Repair uses the sstable sampled keys to split the merkle tree. This means the 'precision' of the tree will be index_interval (so 128 by default). This is probably fine when you have lots of skinny rows. But when you have less fat rows, this is probably unnecessary imprecise.
> Added to that the fact that each node will not have the same set of samples, you may not always end up using the more precise range in the trees when computing differences, which could make the imprecision worst (to be fair, it is quite possible this happens very rarely).
> Anyway, this ticket proposes to add an additional 'split_factor' (can be fixed, can be configurable (by the user or based on metrics on how fat the rows are)) that makes use re-split 'split_factor' times each ranges after the initial sample-based split of the tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira