You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/27 20:44:00 UTC
[jira] [Commented] (IMPALA-7596) Expose JvmPauseMonitor and GC Metrics to Impala's metrics infrastructure

    [ https://issues.apache.org/jira/browse/IMPALA-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631013#comment-16631013 ] 

ASF subversion and git services commented on IMPALA-7596:
---------------------------------------------------------

Commit abd230647fa92db29ac3719096eb4ebc7c151069 in impala's branch refs/heads/master from [~philip]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=abd2306 ]

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }')
  for i in $(seq 100); do
    curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Reviewed-on: http://gerrit.cloudera.org:8080/11468
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Expose JvmPauseMonitor and GC Metrics to Impala's metrics infrastructure
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-7596
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7596
>             Project: IMPALA
>          Issue Type: Task
>          Components: Infrastructure
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Major
>
> In IMPALA-6857 we added a thread that checks for GC pauses a bit. To allow monitoring tools to pick up on the fact that pauses are happening, it's useful to promote those as full-fledged metrics.
> It turns out we were also collecting those metrics by doing a lot of round trips to the Java side of the house. This JIRA may choose to address that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org