You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/14 14:52:51 UTC

[GitHub] [flink] XComp opened a new pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

XComp opened a new pull request #13640:
URL: https://github.com/apache/flink/pull/13640


   ## What is the purpose of the change
   A new metric for monitoring the Metaspace usage needs to be introduced to finish [FLIP-102](https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager).
   
   ## Brief change log
   * Added new metric analogous to `Heap` and `NonHeap` metric.
   * Extended documentation accordingly.
   * Added note to the memory metrics documentation after we ran into problems with IBM's J9 OpenJDK implementation.
   * Included missing testcase for nonheap metric to check whether the metric's value is not static.
   
   ## Verifying this change
   
   * Added new tests to `MetricUtils` to cover the new metric
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: yes
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? docs
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624) 
   * f5e746e18b4799208db4ec2954719c27d74df5f5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681",
       "triggerID" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e957d897105a0bff335d8db266b9bc518942eae1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e957d897105a0bff335d8db266b9bc518942eae1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * adb8530ae6bd18071a82319fe9d43aca7b23f5d4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681) 
   * e957d897105a0bff335d8db266b9bc518942eae1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624) 
   * f5e746e18b4799208db4ec2954719c27d74df5f5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681",
       "triggerID" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e957d897105a0bff335d8db266b9bc518942eae1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7753",
       "triggerID" : "e957d897105a0bff335d8db266b9bc518942eae1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * adb8530ae6bd18071a82319fe9d43aca7b23f5d4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681) 
   * e957d897105a0bff335d8db266b9bc518942eae1 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7753) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol merged pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
zentol merged pull request #13640:
URL: https://github.com/apache/flink/pull/13640


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
XComp commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r505472376



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/util/MetricUtils.java
##########
@@ -214,6 +219,29 @@ static void instantiateNonHeapMemoryMetrics(final MetricGroup metricGroup) {
 		instantiateMemoryUsageMetrics(metricGroup, () -> ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage());
 	}
 
+	@VisibleForTesting
+	static void instantiateMetaspaceMemoryMetrics(final MetricGroup parentMetricGroup) {
+		final List<MemoryPoolMXBean> memoryPoolMXBeans = ManagementFactory.getMemoryPoolMXBeans()
+			.stream()
+			.filter(bean -> "Metaspace".equals(bean.getName()))
+			.collect(Collectors.toList());
+
+		if (memoryPoolMXBeans.isEmpty()) {
+			LOG.warn("No memory pool named 'Metaspace' is present. The '{}' metric group is not going to be instantiated.", METRIC_GROUP_METASPACE_NAME);
+			return;
+		}
+
+		final MetricGroup metricGroup = parentMetricGroup.addGroup(METRIC_GROUP_METASPACE_NAME);
+		final Iterator<MemoryPoolMXBean> beanIterator = memoryPoolMXBeans.iterator();
+
+		final MemoryPoolMXBean firstPool = beanIterator.next();
+		instantiateMemoryUsageMetrics(metricGroup, firstPool::getUsage);
+
+		if (beanIterator.hasNext()) {
+			LOG.warn("More than one memory pool named '{}' are present. Only the first pool was used for instantiating the metric.", METRIC_GROUP_METASPACE_NAME);

Review comment:
       Yup, I bet it's quite theoretical. I just felt a bit uneasy about ignoring this one.
   
   The pool name issue was actually not what I intended and, therefore, a "bug". Thanks for pointing that out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r505409970



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/util/MetricUtils.java
##########
@@ -214,6 +219,29 @@ static void instantiateNonHeapMemoryMetrics(final MetricGroup metricGroup) {
 		instantiateMemoryUsageMetrics(metricGroup, () -> ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage());
 	}
 
+	@VisibleForTesting
+	static void instantiateMetaspaceMemoryMetrics(final MetricGroup parentMetricGroup) {
+		final List<MemoryPoolMXBean> memoryPoolMXBeans = ManagementFactory.getMemoryPoolMXBeans()
+			.stream()
+			.filter(bean -> "Metaspace".equals(bean.getName()))
+			.collect(Collectors.toList());
+
+		if (memoryPoolMXBeans.isEmpty()) {
+			LOG.warn("No memory pool named 'Metaspace' is present. The '{}' metric group is not going to be instantiated.", METRIC_GROUP_METASPACE_NAME);

Review comment:
       ```suggestion
   			LOG.info("Metaspace metrics will not be exposed because no pool named 'Metaspace' could be found. This could be due to the used JVM.");
   ```
   Users typically don't like warning they can't do anything about, and the message is relatively cryptic ("what is a metric group? what is the consequence of it not being instantiated?"

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/util/MetricUtils.java
##########
@@ -214,6 +219,29 @@ static void instantiateNonHeapMemoryMetrics(final MetricGroup metricGroup) {
 		instantiateMemoryUsageMetrics(metricGroup, () -> ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage());
 	}
 
+	@VisibleForTesting
+	static void instantiateMetaspaceMemoryMetrics(final MetricGroup parentMetricGroup) {
+		final List<MemoryPoolMXBean> memoryPoolMXBeans = ManagementFactory.getMemoryPoolMXBeans()
+			.stream()
+			.filter(bean -> "Metaspace".equals(bean.getName()))
+			.collect(Collectors.toList());
+
+		if (memoryPoolMXBeans.isEmpty()) {
+			LOG.warn("No memory pool named 'Metaspace' is present. The '{}' metric group is not going to be instantiated.", METRIC_GROUP_METASPACE_NAME);
+			return;
+		}
+
+		final MetricGroup metricGroup = parentMetricGroup.addGroup(METRIC_GROUP_METASPACE_NAME);
+		final Iterator<MemoryPoolMXBean> beanIterator = memoryPoolMXBeans.iterator();
+
+		final MemoryPoolMXBean firstPool = beanIterator.next();
+		instantiateMemoryUsageMetrics(metricGroup, firstPool::getUsage);
+
+		if (beanIterator.hasNext()) {
+			LOG.warn("More than one memory pool named '{}' are present. Only the first pool was used for instantiating the metric.", METRIC_GROUP_METASPACE_NAME);

Review comment:
       I'm fairly sure that this is mostly a theoretical issue; for it to be usable via JMX the MBeans require some unique identification, and according to the MemoryPoolMXBean javadocs the only unique part of the identifier is in fact the name. In other words, multiple pools of this name being present would likely be a bug in the JVM.
   We can keep this check, but it should be on debug imo.
   
   It is also a bit odd to use the metric group name as the pool name, when we used another string for the actual filtering (as in, move "Metaspace" into a constant, use that here, and maybe make METRIC_GROUP_METASPACE_NAME an alias for this constant.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681",
       "triggerID" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e957d897105a0bff335d8db266b9bc518942eae1",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7753",
       "triggerID" : "e957d897105a0bff335d8db266b9bc518942eae1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e957d897105a0bff335d8db266b9bc518942eae1 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7753) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-709002551


   I resolved your remarks. I left the commits separated for now to make reviewing the changes easier. I'm gonna squash all of it in one commit after the review is done.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r506294172



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/metrics/util/MetricUtilsTest.java
##########
@@ -79,6 +84,23 @@ public void testNonHeapMetricsCompleteness() {
 		Assert.assertNotNull(nonHeapMetrics.get(MetricNames.MEMORY_MAX));
 	}
 
+	@Test
+	public void testMetaspaceCompleteness() {
+		final InterceptingOperatorMetricGroup metaspaceMetrics = new InterceptingOperatorMetricGroup();
+		final InterceptingOperatorMetricGroup parentMetrics = new InterceptingOperatorMetricGroup() {
+			@Override
+			public MetricGroup addGroup(String name) {
+				return metaspaceMetrics;

Review comment:
       Could we not return `this`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681",
       "triggerID" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f5e746e18b4799208db4ec2954719c27d74df5f5 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656) 
   * aa80dacfde367a161ab467905de79dbbeee645ac Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669) 
   * adb8530ae6bd18071a82319fe9d43aca7b23f5d4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
XComp commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r506330191



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/metrics/util/MetricUtilsTest.java
##########
@@ -79,6 +84,23 @@ public void testNonHeapMetricsCompleteness() {
 		Assert.assertNotNull(nonHeapMetrics.get(MetricNames.MEMORY_MAX));
 	}
 
+	@Test
+	public void testMetaspaceCompleteness() {
+		final InterceptingOperatorMetricGroup metaspaceMetrics = new InterceptingOperatorMetricGroup();
+		final InterceptingOperatorMetricGroup parentMetrics = new InterceptingOperatorMetricGroup() {
+			@Override
+			public MetricGroup addGroup(String name) {
+				return metaspaceMetrics;

Review comment:
       Good idea. I addressed it in the code, squashed all the changes into two commits and rebased the PR to the most recent master. Thanks for the review, @zentol .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
XComp commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r506330671



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/metrics/util/MetricUtilsTest.java
##########
@@ -119,4 +141,62 @@ public void testHeapMetrics() throws Exception {
 		}
 		Assert.fail("Heap usage metric never changed it's value.");
 	}
+
+	@Test
+	public void testNonHeapMetricUsageNotStatic() throws InterruptedException {
+		final InterceptingOperatorMetricGroup nonHeapMetrics = new InterceptingOperatorMetricGroup();
+
+		MetricUtils.instantiateNonHeapMemoryMetrics(nonHeapMetrics);
+
+		@SuppressWarnings("unchecked")
+		final Gauge<Long> used = (Gauge<Long>) nonHeapMetrics.get(MetricNames.MEMORY_USED);
+
+		final long usedNonHeapInitially = used.getValue();
+
+		// check memory usage difference multiple times since other tests may affect memory usage as well
+		for (int x = 0; x < 10; x++) {
+			final ByteBuffer tmpByteBuffer = ByteBuffer.allocateDirect(1024 * 1024 * 8);
+			final long usedNonHeapAfterAllocation = used.getValue();
+
+			if (usedNonHeapInitially != usedNonHeapAfterAllocation) {
+				return;
+			}
+			Thread.sleep(50);
+		}
+		Assert.fail("Non-Heap usage metric never changed it's value.");
+	}
+
+	@Test
+	public void testMetaspaceMetricUsageNotStatic() throws InterruptedException {
+		final InterceptingOperatorMetricGroup metaspaceMetrics = new InterceptingOperatorMetricGroup();
+		final InterceptingOperatorMetricGroup parentMetrics = new InterceptingOperatorMetricGroup() {
+			@Override
+			public MetricGroup addGroup(String name) {
+				return metaspaceMetrics;

Review comment:
       It was addressed. See [comment above](https://github.com/apache/flink/pull/13640#discussion_r506330191).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681",
       "triggerID" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aa80dacfde367a161ab467905de79dbbeee645ac Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669) 
   * adb8530ae6bd18071a82319fe9d43aca7b23f5d4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708458328


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit c295945375d779e056cd2503b878c5ea6544a7a7 (Wed Oct 14 14:56:04 UTC 2020)
   
   **Warnings:**
    * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-19617).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681",
       "triggerID" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * adb8530ae6bd18071a82319fe9d43aca7b23f5d4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7681) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624) 
   * f5e746e18b4799208db4ec2954719c27d74df5f5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656) 
   * aa80dacfde367a161ab467905de79dbbeee645ac Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624) 
   * f5e746e18b4799208db4ec2954719c27d74df5f5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656) 
   * aa80dacfde367a161ab467905de79dbbeee645ac UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
XComp commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r505472511



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/util/MetricUtils.java
##########
@@ -214,6 +219,29 @@ static void instantiateNonHeapMemoryMetrics(final MetricGroup metricGroup) {
 		instantiateMemoryUsageMetrics(metricGroup, () -> ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage());
 	}
 
+	@VisibleForTesting
+	static void instantiateMetaspaceMemoryMetrics(final MetricGroup parentMetricGroup) {
+		final List<MemoryPoolMXBean> memoryPoolMXBeans = ManagementFactory.getMemoryPoolMXBeans()
+			.stream()
+			.filter(bean -> "Metaspace".equals(bean.getName()))
+			.collect(Collectors.toList());
+
+		if (memoryPoolMXBeans.isEmpty()) {
+			LOG.warn("No memory pool named 'Metaspace' is present. The '{}' metric group is not going to be instantiated.", METRIC_GROUP_METASPACE_NAME);

Review comment:
       Thanks for clarification. Makes sense!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r505038358



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/util/MetricUtils.java
##########
@@ -214,6 +216,15 @@ static void instantiateNonHeapMemoryMetrics(final MetricGroup metricGroup) {
 		instantiateMemoryUsageMetrics(metricGroup, () -> ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage());
 	}
 
+	@VisibleForTesting
+	static void instantiateMetaspaceMemoryMetrics(final MetricGroup metricGroup) {
+		ManagementFactory.getMemoryPoolMXBeans()
+			.stream()
+			.filter(memoryPoolMXBean -> "Metaspace".equals(memoryPoolMXBean.getName()))
+			.findFirst()

Review comment:
       We should log a debug message if these nothing is found




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13640:
URL: https://github.com/apache/flink/pull/13640#issuecomment-708477385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624",
       "triggerID" : "c295945375d779e056cd2503b878c5ea6544a7a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656",
       "triggerID" : "f5e746e18b4799208db4ec2954719c27d74df5f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669",
       "triggerID" : "aa80dacfde367a161ab467905de79dbbeee645ac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "adb8530ae6bd18071a82319fe9d43aca7b23f5d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c295945375d779e056cd2503b878c5ea6544a7a7 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7624) 
   * f5e746e18b4799208db4ec2954719c27d74df5f5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7656) 
   * aa80dacfde367a161ab467905de79dbbeee645ac Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7669) 
   * adb8530ae6bd18071a82319fe9d43aca7b23f5d4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r506295440



##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/metrics/util/MetricUtilsTest.java
##########
@@ -119,4 +141,62 @@ public void testHeapMetrics() throws Exception {
 		}
 		Assert.fail("Heap usage metric never changed it's value.");
 	}
+
+	@Test
+	public void testNonHeapMetricUsageNotStatic() throws InterruptedException {
+		final InterceptingOperatorMetricGroup nonHeapMetrics = new InterceptingOperatorMetricGroup();
+
+		MetricUtils.instantiateNonHeapMemoryMetrics(nonHeapMetrics);
+
+		@SuppressWarnings("unchecked")
+		final Gauge<Long> used = (Gauge<Long>) nonHeapMetrics.get(MetricNames.MEMORY_USED);
+
+		final long usedNonHeapInitially = used.getValue();
+
+		// check memory usage difference multiple times since other tests may affect memory usage as well
+		for (int x = 0; x < 10; x++) {
+			final ByteBuffer tmpByteBuffer = ByteBuffer.allocateDirect(1024 * 1024 * 8);
+			final long usedNonHeapAfterAllocation = used.getValue();
+
+			if (usedNonHeapInitially != usedNonHeapAfterAllocation) {
+				return;
+			}
+			Thread.sleep(50);
+		}
+		Assert.fail("Non-Heap usage metric never changed it's value.");
+	}
+
+	@Test
+	public void testMetaspaceMetricUsageNotStatic() throws InterruptedException {
+		final InterceptingOperatorMetricGroup metaspaceMetrics = new InterceptingOperatorMetricGroup();
+		final InterceptingOperatorMetricGroup parentMetrics = new InterceptingOperatorMetricGroup() {
+			@Override
+			public MetricGroup addGroup(String name) {
+				return metaspaceMetrics;

Review comment:
       same as above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13640: [FLINK-19617] Added new metric for monitoring the JVM's Metaspace memory pool usage.

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13640:
URL: https://github.com/apache/flink/pull/13640#discussion_r505036909



##########
File path: docs/monitoring/metrics.md
##########
@@ -847,6 +847,8 @@ Thus, in order to infer the metric identifier:
 </table>
 
 ### Memory
+The memory-related metrics require Oracle's memory management (also included in OpenJDK's Hotspot implementation) to be in place. 
+Other JVM implementations (like IBM's J9) might lead to unexpected behavior.

Review comment:
       Why not just say that in these cases metrics may not be exposed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org