You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/11 18:13:16 UTC

[GitHub] [flink-kubernetes-operator] morhidi opened a new pull request, #310: [FLINK-28476] Add metrics for Kubernetes API server access

morhidi opened a new pull request, #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310

   ## What is the purpose of the change
   
   This pull request adds metrics and KPIs related to Kubernetes API server access. Metrics can be enabled by `kubernetes.operator.kubernetes.client.metrics.enabled` (defaults to `true`).
   
   ## Brief change log
   - added various request/response counters
   ```
   -- Counters -------------------------------------------------------------------
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.Count: 94
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.POST.Count: 6
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.PATCH.Count: 10
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.DELETE.Count: 4
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.PUT.Count: 8
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.GET.Count: 66
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.Failed.Count: 3
   
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.Count: 91
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.101.Count: 5
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.409.Count: 1
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.201.Count: 6
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.404.Count: 10
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.200.Count: 69
   
   ```
   - added key request/response KPIs:
   
   ```
   -- Meters ---------------------------------------------------------------------
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.NumPerSecond: 0.08333333333333333
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.NumPerSecond: 0.03333333333333333
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.Failed.NumPerSecond: 0.05
   ```
   
   ```
   -- Histograms -----------------------------------------------------------------
   localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.LatencyNanos: count=91, min=2588875, max=273916959, mean=1.8684283417582415E7, stddev=4.088778006829815E7, p50=7575458.0, p75=1.3146208E7, p95=5.92533498E7, p98=2.7390890844E8, p99=2.73916959E8, p999=2.73916959E8
   ```
   ## Verifying this change
   
   This change added tests that covers the functionality and can be verified as follows:
   
   Manually by enabling the `Slf4jReporterFactory` that dumps the metrics into the logs:
   ```
   kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory
   kubernetes.operator.metrics.reporter.slf4j.interval: 10 SECONDS
   ```
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: no
     - Core observer or reconciler logic that is regularly executed: no
   
   ## Documentation
     - Does this pull request introduce a new feature? (yes)
     - If yes, how is the feature documented? 
       - docs for `kubernetes.operator.kubernetes.client.metrics.enabled` property is autogenerated
       - Metrics descriptions are added to the documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora merged pull request #310: [FLINK-28476] Add metrics for Kubernetes API server access

Posted by GitBox <gi...@apache.org>.
gyfora merged PR #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] morhidi commented on a diff in pull request #310: [FLINK-28476] Add metrics for Kubernetes API server access

Posted by GitBox <gi...@apache.org>.
morhidi commented on code in PR #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310#discussion_r918821370


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/metrics/KubernetesClientMetricsTest.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.metrics;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.kubernetes.operator.TestUtils;
+import org.apache.flink.kubernetes.operator.config.FlinkConfigManager;
+import org.apache.flink.kubernetes.operator.utils.KubernetesClientUtils;
+import org.apache.flink.metrics.testutils.MetricListener;
+
+import io.fabric8.kubernetes.client.KubernetesClient;
+import io.fabric8.kubernetes.client.KubernetesClientException;
+import io.fabric8.kubernetes.client.server.mock.EnableKubernetesMockClient;
+import io.fabric8.kubernetes.client.server.mock.KubernetesMockServer;
+import org.awaitility.Awaitility;
+import org.junit.jupiter.api.MethodOrderer.OrderAnnotation;
+import org.junit.jupiter.api.Order;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.TestMethodOrder;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/** {@link KubernetesClientMetrics} tests. */
+@EnableKubernetesMockClient(crud = true)
+@TestMethodOrder(OrderAnnotation.class)
+public class KubernetesClientMetricsTest {
+    private KubernetesMockServer mockServer;
+    private final MetricListener listener = new MetricListener();
+
+    private static final String REQUEST_COUNTER = "KubeClient.HttpRequest.Count";
+    private static final String REQUEST_METER = "KubeClient.HttpRequest.NumPerSecond";
+    private static final String REQUEST_FAILED_METER = "KubeClient.HttpRequest.Failed.NumPerSecond";
+    private static final String REQUEST_POST_COUNTER = "KubeClient.HttpRequest.POST.Count";
+    private static final String REQUEST_DELETE_COUNTER = "KubeClient.HttpRequest.DELETE.Count";
+    private static final String REQUEST_FAILED_COUNTER = "KubeClient.HttpRequest.Failed.Count";
+    private static final String RESPONSE_COUNTER = "KubeClient.HttpResponse.Count";
+    private static final String RESPONSE_METER = "KubeClient.HttpResponse.NumPerSecond";
+    private static final String RESPONSE_200_COUNTER = "KubeClient.HttpResponse.200.Count";
+    private static final String RESPONSE_404_COUNTER = "KubeClient.HttpResponse.404.Count";
+    private static final String RESPONSE_LATENCY = "KubeClient.HttpResponse.LatencyNanos";

Review Comment:
   Renamed it, and added `kubernetes.operator.metrics.histogram.sample.size` as a configurable global histo size



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] morhidi commented on a diff in pull request #310: [FLINK-28476] Add metrics for Kubernetes API server access

Posted by GitBox <gi...@apache.org>.
morhidi commented on code in PR #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310#discussion_r918821370


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/metrics/KubernetesClientMetricsTest.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.metrics;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.kubernetes.operator.TestUtils;
+import org.apache.flink.kubernetes.operator.config.FlinkConfigManager;
+import org.apache.flink.kubernetes.operator.utils.KubernetesClientUtils;
+import org.apache.flink.metrics.testutils.MetricListener;
+
+import io.fabric8.kubernetes.client.KubernetesClient;
+import io.fabric8.kubernetes.client.KubernetesClientException;
+import io.fabric8.kubernetes.client.server.mock.EnableKubernetesMockClient;
+import io.fabric8.kubernetes.client.server.mock.KubernetesMockServer;
+import org.awaitility.Awaitility;
+import org.junit.jupiter.api.MethodOrderer.OrderAnnotation;
+import org.junit.jupiter.api.Order;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.TestMethodOrder;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/** {@link KubernetesClientMetrics} tests. */
+@EnableKubernetesMockClient(crud = true)
+@TestMethodOrder(OrderAnnotation.class)
+public class KubernetesClientMetricsTest {
+    private KubernetesMockServer mockServer;
+    private final MetricListener listener = new MetricListener();
+
+    private static final String REQUEST_COUNTER = "KubeClient.HttpRequest.Count";
+    private static final String REQUEST_METER = "KubeClient.HttpRequest.NumPerSecond";
+    private static final String REQUEST_FAILED_METER = "KubeClient.HttpRequest.Failed.NumPerSecond";
+    private static final String REQUEST_POST_COUNTER = "KubeClient.HttpRequest.POST.Count";
+    private static final String REQUEST_DELETE_COUNTER = "KubeClient.HttpRequest.DELETE.Count";
+    private static final String REQUEST_FAILED_COUNTER = "KubeClient.HttpRequest.Failed.Count";
+    private static final String RESPONSE_COUNTER = "KubeClient.HttpResponse.Count";
+    private static final String RESPONSE_METER = "KubeClient.HttpResponse.NumPerSecond";
+    private static final String RESPONSE_200_COUNTER = "KubeClient.HttpResponse.200.Count";
+    private static final String RESPONSE_404_COUNTER = "KubeClient.HttpResponse.404.Count";
+    private static final String RESPONSE_LATENCY = "KubeClient.HttpResponse.LatencyNanos";

Review Comment:
   Renamed it, and added `kubernetes.operator.metrics.histogram.sample.size` as a configurable global historian size



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #310: [FLINK-28476] Add metrics for Kubernetes API server access

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310#discussion_r918932826


##########
docs/content/docs/operations/metrics-logging.md:
##########
@@ -37,6 +37,22 @@ The Operator gathers aggregates metrics about managed resources.
 | Namespace | FlinkDeployment.<Status>.Count | Number of managed FlinkDeployment resources per <Status> per namespace. <Status> can take values from: READY, DEPLOYED_NOT_READY, DEPLOYING, MISSING, ERROR | Gauge |
 | Namespace | FlinkSessionJob.Count          | Number of managed FlinkSessionJob instances per namespace                                                                                                   | Gauge |
 
+## Kubernetes Client Metrics
+
+The Operator gathers various metrics related to Kubernetes API server access. The Kubernetes client metrics can be enabled by the configuration `kubernetes.operator.kubernetes.client.metrics.enabled` (default: `true`).
+
+| Scope  | Metrics                                      | Description                                                                                                                                            | Type      |
+|--------|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
+| System | KubeClient.HttpRequest.Count                 | Number of HTTP request sent to the Kubernetes API Server                                                                                               | Counter   |
+| System | KubeClient.HttpRequest.<RequestMethod>.Count | Number of HTTP request sent to the Kubernetes API Server per request method. <RequestMethod> can take values from: GET, POST, PUT, PATCH, DELETE, etc. | Counter   |
+| System | KubeClient.HttpRequest.Failed.Count          | Number of failed HTTP requests that has no response from the Kubernetes API Server                                                                     | Counter   |
+| System | KubeClient.HttpResponse.Count                | Number of HTTP responses received from the Kubernetes API Server                                                                                       | Counter   |
+| System | KubeClient.HttpResponse.<ResponseCode>.Count | Number of HTTP responses received from the Kubernetes API Server per response code. <ResponseCode> can take values from: 200, 404, 503, etc.           | Counter   |
+| System | KubeClient.HttpRequest.NumPerSecond          | Number of HTTP requests sent to the Kubernetes API Server per second                                                                                   | Meter     |
+| System | KubeClient.HttpRequest.Failed.NumPerSecond   | Number of failed HTTP requests sent to the Kubernetes API Server per second                                                                            | Meter     |
+| System | KubeClient.HttpResponse.NumPerSecond         | Number of HTTP responses received from the Kubernetes API Server per second                                                                            | Meter     |
+| System | KubeClient.HttpResponse.LatencyNanos         | Latency statistics obtained from the HTTP responses received from the Kubernetes API Server                                                            | Histogram |

Review Comment:
   Please correct the name to TimeNanos



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] morhidi commented on a diff in pull request #310: [FLINK-28476] Add metrics for Kubernetes API server access

Posted by GitBox <gi...@apache.org>.
morhidi commented on code in PR #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310#discussion_r918934502


##########
docs/content/docs/operations/metrics-logging.md:
##########
@@ -37,6 +37,22 @@ The Operator gathers aggregates metrics about managed resources.
 | Namespace | FlinkDeployment.<Status>.Count | Number of managed FlinkDeployment resources per <Status> per namespace. <Status> can take values from: READY, DEPLOYED_NOT_READY, DEPLOYING, MISSING, ERROR | Gauge |
 | Namespace | FlinkSessionJob.Count          | Number of managed FlinkSessionJob instances per namespace                                                                                                   | Gauge |
 
+## Kubernetes Client Metrics
+
+The Operator gathers various metrics related to Kubernetes API server access. The Kubernetes client metrics can be enabled by the configuration `kubernetes.operator.kubernetes.client.metrics.enabled` (default: `true`).
+
+| Scope  | Metrics                                      | Description                                                                                                                                            | Type      |
+|--------|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
+| System | KubeClient.HttpRequest.Count                 | Number of HTTP request sent to the Kubernetes API Server                                                                                               | Counter   |
+| System | KubeClient.HttpRequest.<RequestMethod>.Count | Number of HTTP request sent to the Kubernetes API Server per request method. <RequestMethod> can take values from: GET, POST, PUT, PATCH, DELETE, etc. | Counter   |
+| System | KubeClient.HttpRequest.Failed.Count          | Number of failed HTTP requests that has no response from the Kubernetes API Server                                                                     | Counter   |
+| System | KubeClient.HttpResponse.Count                | Number of HTTP responses received from the Kubernetes API Server                                                                                       | Counter   |
+| System | KubeClient.HttpResponse.<ResponseCode>.Count | Number of HTTP responses received from the Kubernetes API Server per response code. <ResponseCode> can take values from: 200, 404, 503, etc.           | Counter   |
+| System | KubeClient.HttpRequest.NumPerSecond          | Number of HTTP requests sent to the Kubernetes API Server per second                                                                                   | Meter     |
+| System | KubeClient.HttpRequest.Failed.NumPerSecond   | Number of failed HTTP requests sent to the Kubernetes API Server per second                                                                            | Meter     |
+| System | KubeClient.HttpResponse.NumPerSecond         | Number of HTTP responses received from the Kubernetes API Server per second                                                                            | Meter     |
+| System | KubeClient.HttpResponse.LatencyNanos         | Latency statistics obtained from the HTTP responses received from the Kubernetes API Server                                                            | Histogram |

Review Comment:
   aww, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #310: [FLINK-28476] Add metrics for Kubernetes API server access

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310#discussion_r918670067


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/metrics/KubernetesClientMetricsTest.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.metrics;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.kubernetes.operator.TestUtils;
+import org.apache.flink.kubernetes.operator.config.FlinkConfigManager;
+import org.apache.flink.kubernetes.operator.utils.KubernetesClientUtils;
+import org.apache.flink.metrics.testutils.MetricListener;
+
+import io.fabric8.kubernetes.client.KubernetesClient;
+import io.fabric8.kubernetes.client.KubernetesClientException;
+import io.fabric8.kubernetes.client.server.mock.EnableKubernetesMockClient;
+import io.fabric8.kubernetes.client.server.mock.KubernetesMockServer;
+import org.awaitility.Awaitility;
+import org.junit.jupiter.api.MethodOrderer.OrderAnnotation;
+import org.junit.jupiter.api.Order;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.TestMethodOrder;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/** {@link KubernetesClientMetrics} tests. */
+@EnableKubernetesMockClient(crud = true)
+@TestMethodOrder(OrderAnnotation.class)
+public class KubernetesClientMetricsTest {
+    private KubernetesMockServer mockServer;
+    private final MetricListener listener = new MetricListener();
+
+    private static final String REQUEST_COUNTER = "KubeClient.HttpRequest.Count";
+    private static final String REQUEST_METER = "KubeClient.HttpRequest.NumPerSecond";
+    private static final String REQUEST_FAILED_METER = "KubeClient.HttpRequest.Failed.NumPerSecond";
+    private static final String REQUEST_POST_COUNTER = "KubeClient.HttpRequest.POST.Count";
+    private static final String REQUEST_DELETE_COUNTER = "KubeClient.HttpRequest.DELETE.Count";
+    private static final String REQUEST_FAILED_COUNTER = "KubeClient.HttpRequest.Failed.Count";
+    private static final String RESPONSE_COUNTER = "KubeClient.HttpResponse.Count";
+    private static final String RESPONSE_METER = "KubeClient.HttpResponse.NumPerSecond";
+    private static final String RESPONSE_200_COUNTER = "KubeClient.HttpResponse.200.Count";
+    private static final String RESPONSE_404_COUNTER = "KubeClient.HttpResponse.404.Count";
+    private static final String RESPONSE_LATENCY = "KubeClient.HttpResponse.LatencyNanos";

Review Comment:
   Instead of `LatencyNanos` could we simply call it `TimeNanos` ? I suggest standardize time based metric naming to `TimeUnit`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org