You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/23 16:56:59 UTC

[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request #103: [FLINK-26830] Transition to SUSPEND before redeploying on job upgrades

gyfora opened a new pull request #103:
URL: https://github.com/apache/flink-kubernetes-operator/pull/103


   Break redeployments into two steps to avoid getting into an inconsistent state if job submission fails.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on a change in pull request #103: [FLINK-26830] Transition to SUSPEND before redeploying on job upgrades

Posted by GitBox <gi...@apache.org>.
gyfora commented on a change in pull request #103:
URL: https://github.com/apache/flink-kubernetes-operator/pull/103#discussion_r834033043



##########
File path: flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/ReconciliationUtils.java
##########
@@ -93,6 +101,26 @@ public static void updateForReconciliationError(FlinkDeployment flinkApp, String
         } else {
             updateControl = UpdateControl.noUpdate();
         }
-        return updateControl;
+
+        if (!reschedule) {
+            return updateControl;
+        }
+
+        ReconciliationStatus reconciliationStatus = current.getStatus().getReconciliationStatus();
+        if (current.getSpec().getJob() != null
+                && current.getSpec().getJob().getState() == JobState.RUNNING
+                && reconciliationStatus != null
+                && reconciliationStatus.isSuccess()
+                && reconciliationStatus.getLastReconciledSpec().getJob().getState()
+                        == JobState.SUSPENDED) {
+            return updateControl.rescheduleAfter(0);

Review comment:
       Maybe there is a better way provided by the SDK  (to move to the beginning of the queue) but you could actually argue that it's better to get the new updates before finishing the upgrade, that way you wouldnt have to redeploy twice




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 commented on a change in pull request #103: [FLINK-26830] Transition to SUSPEND before redeploying on job upgrades

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on a change in pull request #103:
URL: https://github.com/apache/flink-kubernetes-operator/pull/103#discussion_r833878128



##########
File path: flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/ReconciliationUtils.java
##########
@@ -93,6 +101,26 @@ public static void updateForReconciliationError(FlinkDeployment flinkApp, String
         } else {
             updateControl = UpdateControl.noUpdate();
         }
-        return updateControl;
+
+        if (!reschedule) {
+            return updateControl;
+        }
+
+        ReconciliationStatus reconciliationStatus = current.getStatus().getReconciliationStatus();
+        if (current.getSpec().getJob() != null
+                && current.getSpec().getJob().getState() == JobState.RUNNING
+                && reconciliationStatus != null
+                && reconciliationStatus.isSuccess()
+                && reconciliationStatus.getLastReconciledSpec().getJob().getState()
+                        == JobState.SUSPENDED) {
+            return updateControl.rescheduleAfter(0);

Review comment:
       Even though we schedule a ZERO delay reconciliation, we still have a chance other CR changes sneak in and then applied together with next reconciliation. Maybe this is not a problem since we are always orienting toward the final state.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora merged pull request #103: [FLINK-26830] Transition to SUSPEND before redeploying on job upgrades

Posted by GitBox <gi...@apache.org>.
gyfora merged pull request #103:
URL: https://github.com/apache/flink-kubernetes-operator/pull/103


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 commented on a change in pull request #103: [FLINK-26830] Transition to SUSPEND before redeploying on job upgrades

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on a change in pull request #103:
URL: https://github.com/apache/flink-kubernetes-operator/pull/103#discussion_r834046231



##########
File path: flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/ReconciliationUtils.java
##########
@@ -93,6 +101,26 @@ public static void updateForReconciliationError(FlinkDeployment flinkApp, String
         } else {
             updateControl = UpdateControl.noUpdate();
         }
-        return updateControl;
+
+        if (!reschedule) {
+            return updateControl;
+        }
+
+        ReconciliationStatus reconciliationStatus = current.getStatus().getReconciliationStatus();
+        if (current.getSpec().getJob() != null
+                && current.getSpec().getJob().getState() == JobState.RUNNING
+                && reconciliationStatus != null
+                && reconciliationStatus.isSuccess()
+                && reconciliationStatus.getLastReconciledSpec().getJob().getState()
+                        == JobState.SUSPENDED) {
+            return updateControl.rescheduleAfter(0);

Review comment:
       Yes. I think we do not need to move the redeploy event to the beginning of the queue. As a side effect, we save an unnecessary restart.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on a change in pull request #103: [FLINK-26830] Transition to SUSPEND before redeploying on job upgrades

Posted by GitBox <gi...@apache.org>.
gyfora commented on a change in pull request #103:
URL: https://github.com/apache/flink-kubernetes-operator/pull/103#discussion_r834033520



##########
File path: flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentControllerTest.java
##########
@@ -403,8 +425,18 @@ public void testUpgradeNotReadyCluster(FlinkDeployment appCluster, boolean allow
 
             flinkService.setPortReady(true);
             testController.reconcile(appCluster, context);
-            testController.reconcile(appCluster, context);
-
+            if (appCluster.getSpec().getJob() != null

Review comment:
       good point




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 commented on a change in pull request #103: [FLINK-26830] Transition to SUSPEND before redeploying on job upgrades

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on a change in pull request #103:
URL: https://github.com/apache/flink-kubernetes-operator/pull/103#discussion_r833874394



##########
File path: flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentControllerTest.java
##########
@@ -403,8 +425,18 @@ public void testUpgradeNotReadyCluster(FlinkDeployment appCluster, boolean allow
 
             flinkService.setPortReady(true);
             testController.reconcile(appCluster, context);
-            testController.reconcile(appCluster, context);
-
+            if (appCluster.getSpec().getJob() != null

Review comment:
       All the tests could pass without this `if...else...` change.  Do I miss something?

##########
File path: flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/ReconciliationUtils.java
##########
@@ -93,6 +101,26 @@ public static void updateForReconciliationError(FlinkDeployment flinkApp, String
         } else {
             updateControl = UpdateControl.noUpdate();
         }
-        return updateControl;
+
+        if (!reschedule) {
+            return updateControl;
+        }
+
+        ReconciliationStatus reconciliationStatus = current.getStatus().getReconciliationStatus();
+        if (current.getSpec().getJob() != null
+                && current.getSpec().getJob().getState() == JobState.RUNNING
+                && reconciliationStatus != null
+                && reconciliationStatus.isSuccess()
+                && reconciliationStatus.getLastReconciledSpec().getJob().getState()
+                        == JobState.SUSPENDED) {
+            return updateControl.rescheduleAfter(0);

Review comment:
       Even though, we schedule a ZERO delay reconciliation, we still have a chance other CR changes sneak in and then applied together with next reconciliation. Maybe this is not a problem since we are always orienting toward the final state.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org