You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/08/09 21:49:01 UTC

[jira] [Commented] (IMPALA-6857) Add JVM Pause Monitor to Impala Processes

    [ https://issues.apache.org/jira/browse/IMPALA-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575434#comment-16575434 ] 

ASF subversion and git services commented on IMPALA-6857:
---------------------------------------------------------

Commit 4976ff3c07f465915ac31312ca67519a600212e6 in impala's branch refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=4976ff3 ]

IMPALA-6857: Add Jvm pause/GC Monitor utility and expose JMX metrics

Pause monitor:
=============

This commit adds a stripped down version of Hadoop's JvmPauseMonitor
class (https://bit.ly/2O6qSwm) . The core implementaion is borrowed
from hadoop-common project and the hadoop dependencies are removed.

- Removed dependency on AbstractService.
- Not relying on Hadoop's Configuration object for reading confs.
- Switched to Guava's implementation of Stopwatch.

This utility class can detect both GC/non-GC pauses. In case of GC
pauses, the GC metrics during the pause period are logged.

Sample Output:
=============
Detected pause in JVM or host machine (eg GC): pause of approximately
2356ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2241ms
GC pool 'PS Scavenge' had collection(s): count=3 time=352ms
Detected pause in JVM or host machine (eg GC): pause of approximately
1964ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2082ms
GC pool 'PS Scavenge' had collection(s): count=1 time=251ms
Detected pause in JVM or host machine (eg GC): pause of approximately
2120ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2454ms
Detected pause in JVM or host machine (eg GC): pause of approximately
2238ms
GC pool 'PS MarkSweep' had collection(s): count=5 time=13464ms
Detected pause in JVM or host machine (eg GC): pause of approximately
2233ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2733ms

JMX Metrics:
============

JMX metrics are now emmitted for Impala and Catalog JVMs at the web end
point /jmx.

- Impalad: http(s)://<impalad-host>:25000/jmx
- Catalogd: http(s)://<catalogd-host>:25020/jmx

Misc:
====

Renamed JvmMetric -> JvmMemoryMetric to make the intent more clear. It
doesn't relate to the functionality of the patch in anyway.

Testing:
=======
- Tested it manually with kill -SIGSTOP/-SIGCONT <pid>. Made sure that
  the non-GC JVM pauses are logged.
- This class' functionality is tested manually by invoking it's main()
- Injected a memory leak into the Catalog server code and made sure the
  GC is detected.

Change-Id: I30d897b7e063846ad6d8f88243e2c04264da0341
Reviewed-on: http://gerrit.cloudera.org:8080/10998
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Add JVM Pause Monitor to Impala Processes
> -----------------------------------------
>
>                 Key: IMPALA-6857
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6857
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>            Reporter: Philip Zeyliger
>            Priority: Major
>              Labels: ramp-up, supportability
>             Fix For: Impala 3.1.0
>
>
> In IMPALA-3114, we added a pause monitor for Impala. In addition to that, we should port/borrow Hadoop's JvmPauseMonitor [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java.] I believe that when the JVM is aggressively GCing, the C++ threads will continue to get scheduled (and won't log), but the Java ones will log. (I've definitely seen JvmPauseMonitor be accurate many times.)
> [~bharathv], when you were testing this, were you able to reproduce it triggering when the JVM half was in "GC hell"?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org