You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Per Otterström (JIRA)" <ji...@apache.org> on 2016/06/02 15:23:59 UTC
[jira] [Commented] (CASSANDRA-11752) histograms/metrics in 2.2 do not appear recency biased

    [ https://issues.apache.org/jira/browse/CASSANDRA-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312467#comment-15312467 ] 

Per Otterström commented on CASSANDRA-11752:
--------------------------------------------

I'm also missing the simplicity of just graphing the percentiles directly. Still, I like the accuracy you get from getValues(), wasn't aware of it until now.

I'd be willing to create a patch for this. I'm thinking of something like this:
- Keep the existing EH implementation as it is. I think we wan't to keep the getValues() implementation for external tools to use. Also, the EH class seem to have some complexity due to the fact that is is used for SSTable metadata which don't match well with decay functionality.
- Instead, create a new DecayingEstimatedHistogram implementation which keeps 5 arrays of buckets and switch out the oldest one every minute. Also, every minute, perform backward decay on values in the old arrays with a factor of two in order to make the recent minute more significant in the percentiles. It should be sufficient to use int rather than long in the buckets in order to save some memory.
- Every time a new metric value is registered, it is added to both an EH and a DecayingEH.
- When reading the bean a call to getValues() is directed to the EH, while calls to min/max/mean and percentiles are directed to DecayingEH.

Some code would be duplicated in the EH and DecayingEH class, but I think I would prefer this approach over adding decay complexity to the existing EH. One option would be to do the decay operation on read. Another option would be to use forward decay.

WDYT?

> histograms/metrics in 2.2 do not appear recency biased
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11752
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11752
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Chris Burroughs
>              Labels: metrics
>         Attachments: boost-metrics.png, c-jconsole-comparison.png, c-metrics.png, default-histogram.png
>
>
> In addition to upgrading to metrics3, CASSANDRA-5657 switched to using  a custom histogram implementation.  After upgrading to Cassandra 2.2 histograms/timer metrics are not suspiciously flat.  To be useful for graphing and alerting metrics need to be biased towards recent events.
> I have attached images that I think illustrate this.
>  * The first two are a comparison between latency observed by a C* 2.2 (us) cluster shoring very flat lines and a client (using metrics 2.2.0, ms) showing server performance problems.  We can't rule out with total certainty that something else isn't the cause (that's why we measure from both the client & server) but they very rarely disagree.
>  * The 3rd image compares jconsole viewing of metrics on a 2.2 and 2.1 cluster over several minutes.  Not a single digit changed on the 2.2 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)