You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/12 07:12:31 UTC

[GitHub] [flink-kubernetes-operator] SteNicholas opened a new pull request, #312: [FLINK-28480] Forward timeControllerExecution time as histogram for JOSDK Metrics interface

SteNicholas opened a new pull request, #312:
URL: https://github.com/apache/flink-kubernetes-operator/pull/312

   ## What is the purpose of the change
   
   Currently the JOSDK metrics forwarder logic doesn't implement the `timeControllerExecution` function.  We should implement it with the following logic.
   
   1. Measure execution time for successful failed executions
   2. Based on the name of the ControllerExectution (reconcile/cleanup) and controller name, track the following histogram metrics metrics
   
   `JOSDK.{ControllerExecution.controllerName}.{ControllerExecution.name}.{ControllerExecution.successTypeName}/failed`
   
   ## Brief change log
   
     - Introduces the `Histogram` metric in the `timeControllerExecution` with the metric name format `JOSDK.{ControllerExecution.controllerName}.{ControllerExecution.name}.{ControllerExecution.successTypeName}/failed`.
   
   ## Verifying this change
   
     - *`OperatorJosdkMetricsTest` adds the `testTimeControllerExecution` to verify whether the `Histogram` metric is correct.*
     - The `Histogram` metric in the `timeControllerExecution` is as follows:
     
   ```
   -- Histograms -----------------------------------------------------------------
   flink-kubernetes-operator-7679966cd6-d4p2g.k8soperator.default.flink-kubernetes-operator.system.JOSDK.flinkdeploymentcontroller.reconcile.resource.Nanos: count=6, min=6517100, max=1198002100, mean=3.375813833333334E8, stddev=4.5900281423332655E8, p50=1.997605E8, p75=6.04624825E8, p95=1.1980021E9, p98=1.1980021E9, p99=1.1980021E9, p999=1.1980021E9
   ```
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: (yes / **no**)
     - Core observer or reconciler logic that is regularly executed: (yes / **no**)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] SteNicholas commented on a diff in pull request #312: [FLINK-28480] Forward timeControllerExecution time as histogram for JOSDK Metrics interface

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on code in PR #312:
URL: https://github.com/apache/flink-kubernetes-operator/pull/312#discussion_r918682179


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/OperatorJosdkMetrics.java:
##########
@@ -113,6 +137,19 @@ public void failedReconciliation(ResourceID resourceID, Exception exception) {
         return map;
     }
 
+    private Histogram histogram(String... names) {
+        MetricGroup group = operatorMetricGroup.addGroup(OPERATOR_SDK_GROUP);
+        for (String name : names) {
+            group = group.addGroup(name);
+        }
+        var finalGroup = group;
+        return histograms.computeIfAbsent(
+                String.join(".", group.getScopeComponents()),
+                s ->
+                        finalGroup.histogram(

Review Comment:
   @gyfora, IMO, the `TimeNanos` makes sense because this use the `clock.relativeTimeNanos` not the seconds. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #312: [FLINK-28480] Forward timeControllerExecution time as histogram for JOSDK Metrics interface

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #312:
URL: https://github.com/apache/flink-kubernetes-operator/pull/312#discussion_r918673937


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/OperatorJosdkMetrics.java:
##########
@@ -113,6 +137,19 @@ public void failedReconciliation(ResourceID resourceID, Exception exception) {
         return map;
     }
 
+    private Histogram histogram(String... names) {
+        MetricGroup group = operatorMetricGroup.addGroup(OPERATOR_SDK_GROUP);
+        for (String name : names) {
+            group = group.addGroup(name);
+        }
+        var finalGroup = group;
+        return histograms.computeIfAbsent(
+                String.join(".", group.getScopeComponents()),
+                s ->
+                        finalGroup.histogram(

Review Comment:
   Instead of `Nanos` I suggest we use Seconds and name it to `TimeSeconds` to make it consistent with other metrics. 



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/OperatorJosdkMetrics.java:
##########
@@ -43,21 +47,41 @@ public class OperatorJosdkMetrics implements Metrics {
     private static final String RECONCILIATION = "Reconciliation";
     private static final String RESOURCE = "Resource";
     private static final String EVENT = "Event";
+    private static final int WINDOW_SIZE = 1000;
 
     private final KubernetesOperatorMetricGroup operatorMetricGroup;
     private final Configuration conf;
+    private final Clock clock;
 
     private final Map<ResourceID, KubernetesResourceNamespaceMetricGroup> resourceNsMetricGroups =
             new ConcurrentHashMap<>();
     private final Map<ResourceID, KubernetesResourceMetricGroup> resourceMetricGroups =
             new ConcurrentHashMap<>();
 
+    private final Map<String, Histogram> histograms = new ConcurrentHashMap<>();
     private final Map<String, Counter> counters = new ConcurrentHashMap<>();
 
     public OperatorJosdkMetrics(
             KubernetesOperatorMetricGroup operatorMetricGroup, Configuration conf) {
         this.operatorMetricGroup = operatorMetricGroup;
         this.conf = conf;
+        this.clock = SystemClock.getInstance();
+    }
+
+    @Override
+    public <T> T timeControllerExecution(ControllerExecution<T> execution) throws Exception {
+        long startTime = clock.relativeTimeNanos();
+        try {
+            T result = execution.execute();
+            String successType = execution.successTypeName(result);
+            histogram(execution.controllerName(), execution.name(), successType)

Review Comment:
   Instead if using the controllerName directly could we please map this to `FlinkDeployment` / `FlinkSessionJob` based on the name? This would make it consistent with other metrics



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #312: [FLINK-28480] Forward timeControllerExecution time as histogram for JOSDK Metrics interface

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #312:
URL: https://github.com/apache/flink-kubernetes-operator/pull/312#discussion_r918684467


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/metrics/OperatorJosdkMetrics.java:
##########
@@ -113,6 +137,19 @@ public void failedReconciliation(ResourceID resourceID, Exception exception) {
         return map;
     }
 
+    private Histogram histogram(String... names) {
+        MetricGroup group = operatorMetricGroup.addGroup(OPERATOR_SDK_GROUP);
+        for (String name : names) {
+            group = group.addGroup(name);
+        }
+        var finalGroup = group;
+        return histograms.computeIfAbsent(
+                String.join(".", group.getScopeComponents()),
+                s ->
+                        finalGroup.histogram(

Review Comment:
   Please convert the nano time to seconds before calling it `TimeSeconds` :D 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] SteNicholas commented on pull request #312: [FLINK-28480] Forward timeControllerExecution time as histogram for JOSDK Metrics interface

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #312:
URL: https://github.com/apache/flink-kubernetes-operator/pull/312#issuecomment-1181509340

   @gyfora, I have addressed above comments and verified locally. The metric is as follows:
   ```
   -- Histograms -----------------------------------------------------------------
   flink-kubernetes-operator-7679966cd6-v4qlw.k8soperator.default.flink-kubernetes-operator.system.JOSDK.FlinkDeployment.reconcile.resource.TimeSeconds: count=6, min=0, max=0, mean=0.0, stddev=0.0, p50=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora merged pull request #312: [FLINK-28480] Forward timeControllerExecution time as histogram for JOSDK Metrics interface

Posted by GitBox <gi...@apache.org>.
gyfora merged PR #312:
URL: https://github.com/apache/flink-kubernetes-operator/pull/312


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] SteNicholas commented on pull request #312: [FLINK-28480] Forward timeControllerExecution time as histogram for JOSDK Metrics interface

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #312:
URL: https://github.com/apache/flink-kubernetes-operator/pull/312#issuecomment-1181402330

   cc @gyfora @morhidi , PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org