You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "mateczagany (via GitHub)" <gi...@apache.org> on 2023/03/30 16:26:10 UTC

[GitHub] [flink-kubernetes-operator] mateczagany commented on a diff in pull request #558: [FLINK-31303] Expose Flink application resource usage via metrics and status

mateczagany commented on code in PR #558:
URL: https://github.com/apache/flink-kubernetes-operator/pull/558#discussion_r1153504166


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java:
##########
@@ -627,14 +637,42 @@ public Map<String, String> getClusterInfo(Configuration conf) throws Exception {
                                             .toSeconds(),
                                     TimeUnit.SECONDS);
 
-            runtimeVersion.put(
+            clusterInfo.put(
                     DashboardConfiguration.FIELD_NAME_FLINK_VERSION,
                     dashboardConfiguration.getFlinkVersion());
-            runtimeVersion.put(
+            clusterInfo.put(
                     DashboardConfiguration.FIELD_NAME_FLINK_REVISION,
                     dashboardConfiguration.getFlinkRevision());
         }
-        return runtimeVersion;
+
+        // JobManager resource usage can be deduced from the CR
+        var jmParameters =
+                new KubernetesJobManagerParameters(
+                        conf, new KubernetesClusterClientFactory().getClusterSpecification(conf));
+        var jmTotalCpu =
+                jmParameters.getJobManagerCPU()
+                        * jmParameters.getJobManagerCPULimitFactor()
+                        * jmParameters.getReplicas();
+        var jmTotalMemory =
+                Math.round(
+                        jmParameters.getJobManagerMemoryMB()
+                                * Math.pow(1024, 2)
+                                * jmParameters.getJobManagerMemoryLimitFactor()
+                                * jmParameters.getReplicas());
+
+        // TaskManager resource usage is best gathered from the REST API to get current replicas

Review Comment:
   If fractional values are used for the CPU, there will be a difference between retrieving it from Flink REST and Kubernetes CR. Flink uses `Hardware.getNumberCPUCores()` under the hood to retrieve this value, not sure exactly how that works, but it's definitely an integer in the end :D 
   
   This will lead to weird scenarios where if you have 3 JM and 3 TM replicas, all with `.5` CPU shares, the result will be `4.5` as total CPUs.
   
   An easy solution might be to just retrieve the number of TMs and multiply it with the CPU defined in the CR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org