You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Tan Kim (Jira)" <ji...@apache.org> on 2023/04/30 08:38:00 UTC

[jira] [Created] (FLINK-31976) Once marked as an inefficient scale-up, further scaling may not happen forever

Tan Kim created FLINK-31976:
-------------------------------

             Summary: Once marked as an inefficient scale-up, further scaling may not happen forever
                 Key: FLINK-31976
                 URL: https://issues.apache.org/jira/browse/FLINK-31976
             Project: Flink
          Issue Type: Improvement
          Components: Autoscaler
    Affects Versions: 1.17.0
            Reporter: Tan Kim


The determination of whether it is an inefficient scale-up is calculated as follows


{code:java}
double lastProcRate = lastSummary.getMetrics().get(TRUE_PROCESSING_RATE).getAverage(); // 22569.315633422066
double lastExpectedProcRate =
lastSummary.getMetrics().get(EXPECTED_PROCESSING_RATE).getCurrent(); // 37340.0
var currentProcRate = evaluatedMetrics.get(TRUE_PROCESSING_RATE).getAverage();
double expectedIncrease = lastExpectedProcRate - lastProcRate;
double actualIncrease = currentProcRate - lastProcRate;

boolean withinEffectiveThreshold =
(actualIncrease / expectedIncrease)
>= conf.get(AutoScalerOptions.SCALING_EFFECTIVENESS_THRESHOLD);{code}

Because the expectedIncrease value references the last scaling history, it will not change unless there is an additional scale-up, only the actualIncrease value will change.
The actualIncrease value is currentProcRate( avg of TRUE_PROCESSING_RATE),
The calculation of TRUE_PROCESSING_RATE is as follows
trueProcessingRate = busyTimeMultiplier * numRecordsInPerSecond.getSum()

For example, let's say you've been marked as an inefficient scale-up, but the LAG continues to build up.
You need to scale up to eliminate the growing LAG, but because you're marked as an inefficient scale-up, it won't happen.
To unmark a scaleup as inefficient, the following conditions must be met: actualIncrease/expectedIncrease > SCALING_EFFECTIVENESS_THRESHOLD (default 0.1)

Here, expectedIncrease is a constant with lastSummary, so the value of actualIncrease must increase.
However, the actualIncrease value is proportional to busyTimeMultiplier and numRecordsInPerSecond, and these two values will converge to a certain value if no scaling occurs.
Therefore, the value of actualIncrease will also converge.
If this value fails to cross a threshold, no further scaling up is possible, even if the lag continues to build up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)