You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/09/01 17:26:10 UTC

[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

    [ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095348#comment-13095348 ] 

Sylvain Lebresne commented on CASSANDRA-2698:
---------------------------------------------

An EstimatedHistogram would be just fine. That plus for each pair of merkle tree, the number of ranges that differs and the corresponding streamed size of the data would be a very good start imho.

I think the only thing we need to figure out for this patch is where it makes the most sense to record that data. What I mean here is that the merkle trees are computed on each node participating in a repair (and thus that is where the EstimatedHistogram can be computed), while the computing of the differences is only done on the coordinator. But on an ideal world, it would seem more useful to store those information together (for a given repair) because they are related.

> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2698
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>              Labels: lhf
>
> Some reports indicate that repair sometime transfer huge amounts of data. One hypothesis is that the merkle tree precision may deteriorate too much at some data size. To check this hypothesis, it would be reasonably to gather statistic during the merkle tree building of how many rows each merkle tree range account for (and the size that this represent). It is probably an interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira