You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/07 14:57:03 UTC

[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

gyfora opened a new pull request, #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302

   Clean up and formalize operator metric scopes:
    - `system`: Operator level metrics
    - `namespace`: Resource Namespace level metrics (belonging to one of the watched namespaces)
    - `resource` : Resource level metrics
   
   Before:
   ```
   hostname.k8soperator.default.flink-kubernetes-operator.Status.JVM.Memory.Mapped.MemoryUsed: 0
   hostname.k8soperator.default.flink-kubernetes-operator.resourcens.default.FlinkDeployment.DEPLOYING.Count: 0
   ```
   
   After:
   ```
   hostname.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Mapped.MemoryUsed: 0
   hostname.k8soperator.default.flink-kubernetes-operator.namespace.default.FlinkDeployment.DEPLOYING.Count: 0
   ```
   
   One given metric called `Metric` in the 3 scopes:
   ```
   <prefix>.system.Metric
   <prefix>.namespace.resourceNs.Metric
   <prefix>.resource.resoureNs.resourceName.Metric
   ```
   
   As an addition based on this work, this PR also includes a minimalistic integration with the JOSDK built in metrics:
   
   sample output
   ```
   hostname.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Reconciliation.finished.Count: 4
   hostname.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Reconciliation.started.Count: 4
   
   hostname.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Resource.Event.Count: 5
   hostname.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Resource.Event.ADDED.Count: 2
   hostname.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Resource.Event.DELETED.Count: 1
   hostname.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Resource.Event.UPDATED.Count: 2
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
gyfora commented on PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#issuecomment-1177748199

   cc @SteNicholas @morhidi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] SteNicholas commented on pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#issuecomment-1178475779

   @gyfora, thanks for the improvement for JOSDK metrics. I have verified the  JOSDK metrics and the metrics look good to me.
   ```
   -- Counters -------------------------------------------------------------------
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Resource.Event.Count: 1
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Reconciliation.finished.Count: 20
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Resource.Event.ADDED.Count: 1
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.resource.default.basic-example.JOSDK.Reconciliation.Count: 20
   
   -- Gauges ---------------------------------------------------------------------
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.namespace.default.FlinkDeployment.Count: 1
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.GarbageCollector.G1 Young Generation.Count: 4
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Direct.MemoryUsed: 16777225
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.namespace.default.FlinkDeployment.READY.Count: 1
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Mapped.TotalCapacity: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Mapped.MemoryUsed: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.GarbageCollector.G1 Old Generation.Count: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Metaspace.Max: -1
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.namespace.default.FlinkDeployment.DEPLOYING.Count: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Direct.TotalCapacity: 16777224
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.namespace.default.FlinkDeployment.ERROR.Count: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Heap.Max: 2061500416
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.NonHeap.Max: -1
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Direct.Count: 5
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.namespace.default.FlinkDeployment.DEPLOYED_NOT_READY.Count: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.GarbageCollector.G1 Old Generation.Time: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.NonHeap.Committed: 63680512
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Mapped.Count: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.CPU.Load: 0.0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.GarbageCollector.G1 Young Generation.Time: 89
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.namespace.default.FlinkDeployment.MISSING.Count: 0
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.NonHeap.Used: 59893976
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.ClassLoader.ClassesUnloaded: 6
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.CPU.Time: 13130000000
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Heap.Committed: 132120576
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Metaspace.Committed: 43536384
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.ClassLoader.ClassesLoaded: 7894
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Threads.Count: 27
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Heap.Used: 71449296
   flink-kubernetes-operator-59fb9cfd78-2wwfm.k8soperator.default.flink-kubernetes-operator.system.Status.JVM.Memory.Metaspace.Used: 42010360
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] SteNicholas commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916401412


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/KubernetesOperatorMetricOptions.java:
##########
@@ -23,8 +23,22 @@
 /** Configuration options for metrics. */
 public class KubernetesOperatorMetricOptions {
     public static final ConfigOption<String> SCOPE_NAMING_KUBERNETES_OPERATOR =
-            ConfigOptions.key("metrics.scope.k8soperator")
-                    .defaultValue("<host>.k8soperator.<namespace>.<name>")
+            ConfigOptions.key("metrics.scope.k8soperator.system")

Review Comment:
   Does the `KubernetesOperatorMetricOptions` need to generate the configuration document in `flink-kubernetes-docs`?



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/OperatorJosdkMetrics.java:
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.metrics;
+
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.runtime.metrics.MetricRegistry;
+
+import io.javaoperatorsdk.operator.api.monitoring.Metrics;
+import io.javaoperatorsdk.operator.api.reconciler.RetryInfo;
+import io.javaoperatorsdk.operator.processing.event.Event;
+import io.javaoperatorsdk.operator.processing.event.ResourceID;
+import io.javaoperatorsdk.operator.processing.event.source.controller.ResourceEvent;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Implementation of {@link Metrics} to monitor and forward JOSDK metrics to {@link MetricRegistry}.
+ */
+public class OperatorJosdkMetrics implements Metrics {

Review Comment:
   Why not the implementation override the `timeControllerExecution` interface? Doesn't we need to support the metrics for the exection?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
gyfora commented on PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#issuecomment-1178606935

   Thank you @SteNicholas for checking this, I have fixed the config reference docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] morhidi commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
morhidi commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916539636


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##########
@@ -109,6 +111,9 @@ public static FlinkOperatorConfiguration fromConfiguration(Configuration operato
                 operatorConfig.get(
                         KubernetesOperatorConfigOptions.OPERATOR_DYNAMIC_NAMESPACES_ENABLED);
 
+        boolean josdkMetricsEnabled =

Review Comment:
   nit: Boolean vs boolean



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora merged pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
gyfora merged PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916555598


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/TestUtils.java:
##########
@@ -421,6 +427,28 @@ public static void setEnv(Map<String, String> newEnv) {
         }
     }
 
+    public static <T extends AbstractFlinkResource<?, ?>> MetricManager<T> createTestMetricManager(

Review Comment:
   The MetricListener doesn't expose the registry and also it hardcodes a base group, so I had to remove it from our tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916502542


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/KubernetesOperatorMetricOptions.java:
##########
@@ -23,8 +23,22 @@
 /** Configuration options for metrics. */
 public class KubernetesOperatorMetricOptions {
     public static final ConfigOption<String> SCOPE_NAMING_KUBERNETES_OPERATOR =
-            ConfigOptions.key("metrics.scope.k8soperator")
-                    .defaultValue("<host>.k8soperator.<namespace>.<name>")
+            ConfigOptions.key("metrics.scope.k8soperator.system")

Review Comment:
   I fixed this, also introduced sections in the general doc to make everything clearer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916467085


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/OperatorJosdkMetrics.java:
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.metrics;
+
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.runtime.metrics.MetricRegistry;
+
+import io.javaoperatorsdk.operator.api.monitoring.Metrics;
+import io.javaoperatorsdk.operator.api.reconciler.RetryInfo;
+import io.javaoperatorsdk.operator.processing.event.Event;
+import io.javaoperatorsdk.operator.processing.event.ResourceID;
+import io.javaoperatorsdk.operator.processing.event.source.controller.ResourceEvent;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Implementation of {@link Metrics} to monitor and forward JOSDK metrics to {@link MetricRegistry}.
+ */
+public class OperatorJosdkMetrics implements Metrics {

Review Comment:
   It's not clear what that measures and what benefit it would bring. So I decided to simply not do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] morhidi commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
morhidi commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916562287


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/utils/StatusRecorderTest.java:
##########
@@ -70,49 +74,64 @@ public void testPatchOnlyWhenChanged() throws InterruptedException {
 
     @Test
     public void testFlinkDeploymentMetrics() throws InterruptedException {
-        var metricListener = new MetricListener();
+        var metrics = new HashMap<String, Metric>();

Review Comment:
   This is boilerplate, do not understand why you prefer this approach over the `MetricListener`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] morhidi commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
morhidi commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916553219


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/TestUtils.java:
##########
@@ -421,6 +427,28 @@ public static void setEnv(Map<String, String> newEnv) {
         }
     }
 
+    public static <T extends AbstractFlinkResource<?, ?>> MetricManager<T> createTestMetricManager(

Review Comment:
   nit: There's a MetricListener utility class for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #302: [FLINK-28297][FLINK-27914] Improve operator metric groups + Add JOSDK metric integration

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #302:
URL: https://github.com/apache/flink-kubernetes-operator/pull/302#discussion_r916566646


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/utils/StatusRecorderTest.java:
##########
@@ -70,49 +74,64 @@ public void testPatchOnlyWhenChanged() throws InterruptedException {
 
     @Test
     public void testFlinkDeploymentMetrics() throws InterruptedException {
-        var metricListener = new MetricListener();
+        var metrics = new HashMap<String, Metric>();

Review Comment:
   It's not very easy to use for these simple tests due to some hardcoded/not exposed logic. I added a more details in the other comment :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org