You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sakthi (JIRA)" <ji...@apache.org> on 2019/03/05 02:02:00 UTC

[jira] [Commented] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements

    [ https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783976#comment-16783976 ] 

Sakthi commented on HBASE-21991:
--------------------------------

Regarding the faulty remove logic:
 * According to the lossy counting algorithm, non-eligible meters are swept off from on every (1/e)th access [e = error rate. Default = 0.02]. Hence under default settings, in a stream of accesses, at every 50th access, the non-eligible meters are pruned off.
 * But with current implementation(with a bug), if every 50th (i.e. every (1/e)th) access is of a already existing clientRequestMeter then non-eligible meters might never be pruned off and we might end up storing/exposing all the meters rather than top-k-ish

Have verified with a unit test.

> Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-21991
>                 URL: https://issues.apache.org/jira/browse/HBASE-21991
>             Project: HBase
>          Issue Type: Bug
>          Components: Coprocessors, metrics
>            Reporter: Sakthi
>            Assignee: Sakthi
>            Priority: Major
>
> Here is a list of the issues related to the MetaMetrics implementation:
> +*Bugs*+:
>  # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: Under certain conditions, we might end up storing/exposing all the meters rather than top-k-ish
>  # MetaMetrics can throw NPE resulting in aborting of the RS because of a *Race Condition*.
> +*Improvements*+:
>  # With high number of regions in the cluster, exposure of metrics for each region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of regions. It's better to use *lossy counting to maintain top-k for region metrics* as well.
>  # As the lossy meters do not represent actual counts, I think, it'll be better to *rename the meters to include "lossy" in the name*. It would be more informative while monitoring the metrics and there would be less confusion regarding actual counts to lossy counts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)