You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ambari.apache.org by "Hari Sekhon (JIRA)" <ji...@apache.org> on 2018/07/04 16:22:00 UTC

[jira] [Created] (AMBARI-24244) Ambari

Hari Sekhon created AMBARI-24244:
------------------------------------

             Summary: Ambari
                 Key: AMBARI-24244
                 URL: https://issues.apache.org/jira/browse/AMBARI-24244
             Project: Ambari
          Issue Type: Bug
          Components: ambari-metrics, metrics
    Affects Versions: 2.5.2
            Reporter: Hari Sekhon


Ambari in-built Grafana HBase GC Time graph in HBase - RegionServers dashboard is very wrong and doesn't reflect the times I've grepped across hbase regionserver logs for util.JvmPauseMonitor.

I've inherited a very heavily loaded HBase + OpenTSDB cluster where there are RegionServer losses occurring due to GCs around 30 seconds(!) causing ZK + HMaster to declare them dead. The Grafana graphs show peaks around 70ms due to avg downsampling. Editing the graph and changing Aggregator from 'avg' to 'none' and Transform from 'rate' to 'diff' seems to brings the graph closer to what the actual logs and behaviour are showing. But this might need more fiddling and investigation.

Right now the GC Times graphs which is incredibly important is worse than useless, it's misleading as it shows there are no GC issues when there are actually very large very severe GC issues on this cluster.

This is a vanilla Ambari deployed Grafana with Ambari Metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)