You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/11/23 13:10:59 UTC

[GitHub] [flink] gyfora opened a new pull request, #21374: [FLINK-30090] Support timespan for TimerGauges

gyfora opened a new pull request, #21374:
URL: https://github.com/apache/flink/pull/21374

   ## What is the purpose of the change
   
   Allow TimerGauges to define a time window similar to how MeterViews work in Flink. This allows us to collect more relevant metrics without losing information based on the metrics collection interval
   
   ## Brief change log
   
    * Change logic to support custom timespan in TimerGauge
    * Add unit tests
   
   ## Verifying this change
   
   Extended existing unit tests to cover different timespan.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): yes
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #21374: [FLINK-30090] Support timespan for TimerGauges

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #21374:
URL: https://github.com/apache/flink/pull/21374#issuecomment-1325049724

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "dfe16effdc6698b6cbd6bc2a7b91a7f4f9a3320c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "dfe16effdc6698b6cbd6bc2a7b91a7f4f9a3320c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * dfe16effdc6698b6cbd6bc2a7b91a7f4f9a3320c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] gyfora commented on pull request #21374: [FLINK-30090] Support timespan for TimerGauges

Posted by GitBox <gi...@apache.org>.
gyfora commented on PR #21374:
URL: https://github.com/apache/flink/pull/21374#issuecomment-1330868527

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] mxm commented on a diff in pull request #21374: [FLINK-30090] Support timespan for TimerGauges

Posted by GitBox <gi...@apache.org>.
mxm commented on code in PR #21374:
URL: https://github.com/apache/flink/pull/21374#discussion_r1030658450


##########
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/TimerGauge.java:
##########
@@ -50,11 +62,24 @@ public class TimerGauge implements Gauge<Long>, View {
     private long accumulatedCount;
 
     public TimerGauge() {
-        this(SystemClock.getInstance());
+        this(DEFAULT_TIME_SPAN_IN_SECONDS);
+    }
+
+    public TimerGauge(int timeSpanInSeconds) {
+        this(SystemClock.getInstance(), timeSpanInSeconds);
     }
 
     public TimerGauge(Clock clock) {
+        this(clock, DEFAULT_TIME_SPAN_IN_SECONDS);
+    }
+
+    public TimerGauge(Clock clock, int timeSpanInSeconds) {
         this.clock = clock;
+        this.timeSpanInSeconds =
+                Math.max(
+                        timeSpanInSeconds - (timeSpanInSeconds % UPDATE_INTERVAL_SECONDS),
+                        UPDATE_INTERVAL_SECONDS);

Review Comment:
   It would be kind of nice not having to adjust this but we are constrained by the UPDATE_INTERVAL_SECONDS which we can't easily change. So looks good.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/TimerGauge.java:
##########
@@ -32,9 +32,21 @@
  * happen in a couple of hours, the returned value will account for this ongoing measurement.
  */
 public class TimerGauge implements Gauge<Long>, View {
+
+    private static final int DEFAULT_TIME_SPAN_IN_SECONDS = 60;
+
     private final Clock clock;
 
-    private long previousCount;
+    /** The time-span over which the average is calculated. */
+    private final int timeSpanInSeconds;
+    /** Circular array containing the history of values. */
+    private final long[] values;
+    /** The index in the array for the current time. */
+    private int time = 0;

Review Comment:
   I think `idx` would be a better name for this. Time is confusing because this actually gets reset to zero when the array is full.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/TimerGauge.java:
##########
@@ -89,15 +114,32 @@ public synchronized void update() {
             currentMaxSingleMeasurement =
                     Math.max(currentMaxSingleMeasurement, now - currentMeasurementStartTS);
         }
-        previousCount = Math.max(Math.min(currentCount / UPDATE_INTERVAL_SECONDS, 1000), 0);
+        updateCurrentValue();
         previousMaxSingleMeasurement = currentMaxSingleMeasurement;
         currentCount = 0;
         currentMaxSingleMeasurement = 0;
     }
 
+    private void updateCurrentValue() {
+        if (time == values.length - 1) {
+            fullWindow = true;
+        }
+        values[time] = currentCount;
+        time = (time + 1) % values.length;
+
+        int maxIndex = fullWindow ? values.length : time;
+        long totalTime = 0;
+        for (int i = 0; i < maxIndex; i++) {
+            totalTime += values[i];
+        }

Review Comment:
   I appreciate the attention to detail here on properly calculating the totalTime based on the number of available observations :) Also, I very much prefer it over only releasing the Gauge once we have all observations. That way, this feature shouldn't cause any regressions because the Gauge as quickly as it does now but its accuracy will improve after a minute.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] gyfora merged pull request #21374: [FLINK-30090] Support timespan for TimerGauges

Posted by GitBox <gi...@apache.org>.
gyfora merged PR #21374:
URL: https://github.com/apache/flink/pull/21374


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org