You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "gyfora (via GitHub)" <gi...@apache.org> on 2023/09/08 11:33:15 UTC

[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #666: [FLINK-33067] Expose rate limiter config and enable by default

gyfora opened a new pull request, #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666

   ## What is the purpose of the change
   
   There are certain cases currently when the operator can stuck in an infinite no wait retry loop under certain circumstances (see https://github.com/operator-framework/java-operator-sdk/issues/2046 for details).
   
   As a solution we introduce rate limiter configuration with a default that should already safeguard against prod problems that may arise without it.
   
   The PR also fixes and improves the default retry configuration and fixes a bug with exponential backoff.
   
   ## Brief change log
   
     - *Expose rate limiter config and enable by default*
     - *Update default retry config*
     - *Unit Tests*
   
   ## Verifying this change
   
   Unit tests, manual verification
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: no
     - Core observer or reconciler logic that is regularly executed: yes
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1319917846


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");
+
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Double> OPERATOR_RETRY_INTERVAL_MULTIPLIER =
             operatorConfig("retry.interval.multiplier")
                     .doubleType()
-                    .defaultValue(2.0)
+                    .defaultValue(1.5)

Review Comment:
   Why are we changing this value here? It's unrelated to the rate limiting



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");
+
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Double> OPERATOR_RETRY_INTERVAL_MULTIPLIER =
             operatorConfig("retry.interval.multiplier")
                     .doubleType()
-                    .defaultValue(2.0)
+                    .defaultValue(1.5)
                     .withDescription(
                             "Interval multiplier of automatic reconcile retries on recoverable errors.");
 
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Integer> OPERATOR_RETRY_MAX_ATTEMPTS =
             operatorConfig("retry.max.attempts")
                     .intType()
-                    .defaultValue(10)
+                    .defaultValue(15)

Review Comment:
   Same here.



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");
+
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Double> OPERATOR_RETRY_INTERVAL_MULTIPLIER =
             operatorConfig("retry.interval.multiplier")
                     .doubleType()
-                    .defaultValue(2.0)
+                    .defaultValue(1.5)
                     .withDescription(
                             "Interval multiplier of automatic reconcile retries on recoverable errors.");
 
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Integer> OPERATOR_RETRY_MAX_ATTEMPTS =
             operatorConfig("retry.max.attempts")
                     .intType()
-                    .defaultValue(10)
+                    .defaultValue(15)
                     .withDescription(
                             "Max attempts of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RATE_LIMITER_PERIOD =
+            operatorConfig("rate-limiter.refresh-period")
+                    .durationType()
+                    .defaultValue(Duration.ofSeconds(15))
+                    .withDescription("Operator rate limiter refresh period for each resource.");

Review Comment:
   ```suggestion
               operatorConfig("rate-limiter.period")
                       .durationType()
                       .defaultValue(Duration.ofSeconds(15))
                       .withDescription("Operator rate limiter period for each resource.");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1320053768


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##########
@@ -220,8 +227,14 @@ private static GenericRetry getRetryConfig(Configuration conf) {
                                 .toMillis())
                 .setIntervalMultiplier(
                         conf.getDouble(
-                                KubernetesOperatorConfigOptions
-                                        .OPERATOR_RETRY_INTERVAL_MULTIPLIER));
+                                KubernetesOperatorConfigOptions.OPERATOR_RETRY_INTERVAL_MULTIPLIER))
+                .setMaxInterval(maxInterval > 0 ? maxInterval : Long.MAX_VALUE);

Review Comment:
   I don't really see this MAX_VALUE behaviour that you mention, in any case I changed this to -1 which is the right way to set infinite 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1321408754


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##########
@@ -220,8 +227,14 @@ private static GenericRetry getRetryConfig(Configuration conf) {
                                 .toMillis())
                 .setIntervalMultiplier(
                         conf.getDouble(
-                                KubernetesOperatorConfigOptions
-                                        .OPERATOR_RETRY_INTERVAL_MULTIPLIER));
+                                KubernetesOperatorConfigOptions.OPERATOR_RETRY_INTERVAL_MULTIPLIER))
+                .setMaxInterval(maxInterval > 0 ? maxInterval : Long.MAX_VALUE);

Review Comment:
   Oh I see what you meant, this wouldn't affect us as we are not using the `@GradualRetry` annotation, that is specific to that. In any case I removed this to be less confusing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1320062795


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");
+
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Double> OPERATOR_RETRY_INTERVAL_MULTIPLIER =
             operatorConfig("retry.interval.multiplier")
                     .doubleType()
-                    .defaultValue(2.0)
+                    .defaultValue(1.5)

Review Comment:
   Based on some testing I felt that the current 2 multiplier for the exponential backoff was a bit too aggressive in some cases. So 1.5 with a little longer max retries felt more appropriate. If you feel we shouldn't change this I remove it from the PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mbalassi commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mbalassi (via GitHub)" <gi...@apache.org>.
mbalassi commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1319796729


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");
+
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Double> OPERATOR_RETRY_INTERVAL_MULTIPLIER =
             operatorConfig("retry.interval.multiplier")
                     .doubleType()
-                    .defaultValue(2.0)
+                    .defaultValue(1.5)
                     .withDescription(
                             "Interval multiplier of automatic reconcile retries on recoverable errors.");
 
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Integer> OPERATOR_RETRY_MAX_ATTEMPTS =
             operatorConfig("retry.max.attempts")
                     .intType()
-                    .defaultValue(10)
+                    .defaultValue(15)
                     .withDescription(
                             "Max attempts of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RATE_LIMITER_PERIOD =
+            operatorConfig("rate-limiter.refresh-period")
+                    .durationType()
+                    .defaultValue(Duration.ofSeconds(15))
+                    .withDescription("Operator rate limiter refresh period for each resource.");
+
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Integer> OPERATOR_RATE_LIMITER_LIMIT =
+            operatorConfig("rate-limiter.limit")
+                    .intType()
+                    .defaultValue(5)
+                    .withDescription(
+                            "Max number of reconcile loops triggered in within the rate limiter refresh period for each resource. Setting the limit <= 0 disables the limiter.");

Review Comment:
   nit: `triggered within`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1321394405


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##########
@@ -220,8 +227,14 @@ private static GenericRetry getRetryConfig(Configuration conf) {
                                 .toMillis())
                 .setIntervalMultiplier(
                         conf.getDouble(
-                                KubernetesOperatorConfigOptions
-                                        .OPERATOR_RETRY_INTERVAL_MULTIPLIER));
+                                KubernetesOperatorConfigOptions.OPERATOR_RETRY_INTERVAL_MULTIPLIER))
+                .setMaxInterval(maxInterval > 0 ? maxInterval : Long.MAX_VALUE);

Review Comment:
   See here for default `UNSET_VALUE`: https://github.com/operator-framework/java-operator-sdk/blob/a3fb3ad9ef84ad60e456ac2b3714b070b37e055b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/retry/GenericRetry.java#L103 
   
   which is set to `Long.MAX_VALUE`: https://github.com/operator-framework/java-operator-sdk/blob/a3fb3ad9ef84ad60e456ac2b3714b070b37e055b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/retry/GradualRetry.java#L19



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1319954233


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##########
@@ -220,8 +227,14 @@ private static GenericRetry getRetryConfig(Configuration conf) {
                                 .toMillis())
                 .setIntervalMultiplier(
                         conf.getDouble(
-                                KubernetesOperatorConfigOptions
-                                        .OPERATOR_RETRY_INTERVAL_MULTIPLIER));
+                                KubernetesOperatorConfigOptions.OPERATOR_RETRY_INTERVAL_MULTIPLIER))
+                .setMaxInterval(maxInterval > 0 ? maxInterval : Long.MAX_VALUE);

Review Comment:
   The default value `Long.MAX_VALUE` means, retry as often as you like? Any value than that means wait at least that amount of time in milliseconds?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora commented on PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#issuecomment-1713216273

   @mxm I have updated the PR based on the comments, also simplified wording for the retry related configs. PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora merged pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora merged PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora commented on PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#issuecomment-1711596705

   > Thanks, great. Could you please explain the default behavior in plain English now given the default values you propose? (How many request would we send/resources/sec)
   
   In plain English the default is:
   At most 5 reconciles for a given resource every 15 seconds. This should be more than enough to not have any visible effect on the normal behaviour (upgrades would go through instantaneously as they need only 2/3 reconciles)
   
   But if it gets stuck in an error loop, it will only do 5 loops, then wait for 15 seconds, rinse and repeat


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1319954969


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##########
@@ -220,8 +227,14 @@ private static GenericRetry getRetryConfig(Configuration conf) {
                                 .toMillis())
                 .setIntervalMultiplier(
                         conf.getDouble(
-                                KubernetesOperatorConfigOptions
-                                        .OPERATOR_RETRY_INTERVAL_MULTIPLIER));
+                                KubernetesOperatorConfigOptions.OPERATOR_RETRY_INTERVAL_MULTIPLIER))
+                .setMaxInterval(maxInterval > 0 ? maxInterval : Long.MAX_VALUE);

Review Comment:
   It's a bit tricky because the JOSDK treats Long.MAX_VALUE as unset...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1321395107


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkOperatorConfiguration.java:
##########
@@ -220,8 +227,14 @@ private static GenericRetry getRetryConfig(Configuration conf) {
                                 .toMillis())
                 .setIntervalMultiplier(
                         conf.getDouble(
-                                KubernetesOperatorConfigOptions
-                                        .OPERATOR_RETRY_INTERVAL_MULTIPLIER));
+                                KubernetesOperatorConfigOptions.OPERATOR_RETRY_INTERVAL_MULTIPLIER))
+                .setMaxInterval(maxInterval > 0 ? maxInterval : Long.MAX_VALUE);

Review Comment:
   If set to `Long.MAX_VALUE`, this default will be applied: https://github.com/operator-framework/java-operator-sdk/blob/a3fb3ad9ef84ad60e456ac2b3714b070b37e055b/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/retry/GradualRetry.java#L16



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1319970534


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");
+
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Double> OPERATOR_RETRY_INTERVAL_MULTIPLIER =
             operatorConfig("retry.interval.multiplier")
                     .doubleType()
-                    .defaultValue(2.0)
+                    .defaultValue(1.5)
                     .withDescription(
                             "Interval multiplier of automatic reconcile retries on recoverable errors.");
 
     @Documentation.Section(SECTION_SYSTEM)
     public static final ConfigOption<Integer> OPERATOR_RETRY_MAX_ATTEMPTS =
             operatorConfig("retry.max.attempts")
                     .intType()
-                    .defaultValue(10)
+                    .defaultValue(15)
                     .withDescription(
                             "Max attempts of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RATE_LIMITER_PERIOD =
+            operatorConfig("rate-limiter.refresh-period")
+                    .durationType()
+                    .defaultValue(Duration.ofSeconds(15))
+                    .withDescription("Operator rate limiter refresh period for each resource.");

Review Comment:
   I guess it means, that we refresh 15 seconds after hitting the limit. Current name makes sense then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1319971456


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");

Review Comment:
   AFAIK the current default wasn't modified by adding this configuration option, correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "mxm (via GitHub)" <gi...@apache.org>.
mxm commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1319917059


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");

Review Comment:
   How are recoverable errors defined?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1320061447


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");

Review Comment:
   As for the "recoverable error", I just lifted the wording from the other config. It basically means any error that is not handled by the operator logic directly but is "thrown" and therefore should be retried :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #666: [FLINK-33067] Expose rate limiter config and enable by default

Posted by "gyfora (via GitHub)" <gi...@apache.org>.
gyfora commented on code in PR #666:
URL: https://github.com/apache/flink-kubernetes-operator/pull/666#discussion_r1320058746


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java:
##########
@@ -365,22 +365,45 @@ public static String operatorConfigKey(String key) {
                     .withDescription(
                             "Initial interval of automatic reconcile retries on recoverable errors.");
 
+    @Documentation.Section(SECTION_SYSTEM)
+    public static final ConfigOption<Duration> OPERATOR_RETRY_MAX_INTERVAL =
+            operatorConfig("retry.max.interval")
+                    .durationType()
+                    .defaultValue(Duration.ZERO)
+                    .withDescription(
+                            "Max interval of automatic reconcile retries on recoverable errors, if set to <=0, no limit is applied.");

Review Comment:
   Previously it was infinite, and a recent change actually introduced a 15sec limit accidentally. This is a correction of that. So with this, no change since the last release :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org