You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Hari Sekhon (JIRA)" <ji...@apache.org> on 2018/07/04 16:34:00 UTC

[jira] [Updated] (AMBARI-24244) Grafana HBase GC Time graph showing very wrong GC times (off by many secs)

     [ https://issues.apache.org/jira/browse/AMBARI-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Sekhon updated AMBARI-24244:
---------------------------------
    Summary: Grafana HBase GC Time graph showing very wrong GC times (off by many secs)  (was: Ambari)

> Grafana HBase GC Time graph showing very wrong GC times (off by many secs)
> --------------------------------------------------------------------------
>
>                 Key: AMBARI-24244
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24244
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-metrics, metrics
>    Affects Versions: 2.5.2
>            Reporter: Hari Sekhon
>            Priority: Major
>
> Ambari's in-built Grafana graph for "JVM GC Times" graph in the HBase - RegionServers dashboard is very wrong and doesn't reflect the times I've grepped across HBase RegionServer logs for util.JvmPauseMonitor.
> I've inherited a very heavily loaded HBase + OpenTSDB cluster where there are RegionServer losses occurring due to GCs around 30 seconds(!) causing ZK + HMaster to declare them dead. The Grafana graphs show peaks around 70ms due to avg downsampling. Editing the graph and changing Aggregator from 'avg' to 'none' and Transform from 'rate' to 'diff' seems to brings the graph closer to what the actual logs and behaviour are showing. But this might need more fiddling and investigation.
> Right now the GC Times graph which is incredibly important is worse than useless, it's misleading as it shows there are no GC issues when there are actually very large very severe GC issues on this cluster.
> This is a vanilla Ambari deployed Grafana with Ambari Metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)