You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2013/06/16 00:57:20 UTC

[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

    [ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684430#comment-13684430 ] 

Benedict commented on CASSANDRA-2698:
-------------------------------------

Hi Yuki,

Sorry for the glacial response.

{quote}
If you do that, you should create your own class with labels and array since you're not using default offsets nor other histogram related methods. It confused me at first why you are doing addToIndex to EstimatedHistogram.
{quote}

Agreed. I was a little reticent to introduce a different histogram class, but it is a little ugly as it stands.

{quote}
But looking at this from the begining again, what we want to see is if we have Merkle tree of evenly distributed keys(or rows) in each hash. You can use EstimatedHistogram or your own to show that. For now, just use logger to log that distribution at the end of Merkle Tree creation with corresponding repair session Id is fine, instead of sending stats back to the coordinator.
{quote}

It sounds like all you want logged is the number of rows per merkle leaf, to see if the tree is roughly balanced? In which case I misinterpreted the goal entirely, though it makes sense now I understand more how the repair works. Is it worth leaving in the streaming of the leaf sizes with the merkle tree so that the deltas can be logged in future, should that be desired? 

I'll strip out the logging of the size of the ranges being streamed for now as well.


                
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2698
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Benedict
>            Priority: Minor
>              Labels: lhf
>         Attachments: nodetool_repair_and_cfhistogram.tar.gz, patch_2698_v1.txt, patch.diff, patch-rebased.diff
>
>
> Some reports indicate that repair sometime transfer huge amounts of data. One hypothesis is that the merkle tree precision may deteriorate too much at some data size. To check this hypothesis, it would be reasonably to gather statistic during the merkle tree building of how many rows each merkle tree range account for (and the size that this represent). It is probably an interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira