You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/11/14 22:29:44 UTC

[GitHub] [flink-kubernetes-operator] tweise opened a new pull request, #440: [FLINK-30004] Cleanup deployment after savepoint for Flink versions < 1.15

tweise opened a new pull request, #440:
URL: https://github.com/apache/flink-kubernetes-operator/pull/440

   ## What is the purpose of the change
   
   * Cleanup underlying deployment resources after cancel with savepoint. Flink version < 1.15 is supposed to do that automatically but we see instances where stray HA config maps cause resume/restore to fail.
   
   ## Brief change log
   
   * if the Flink version is < 1.15, cleanup deployment after cancel
   
   ## Verifying this change
   
   * unit test modified to cover the version specific behavior
   * manual verification with existing deployment
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: (yes / **no**)
     - Core observer or reconciler logic that is regularly executed: (yes / **no**)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #440: [FLINK-30004] Cleanup deployment after savepoint for Flink versions < 1.15

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #440:
URL: https://github.com/apache/flink-kubernetes-operator/pull/440#discussion_r1022682931


##########
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/service/NativeFlinkServiceTest.java:
##########
@@ -143,13 +146,28 @@ public void testCancelJobWithSavepointUpgradeMode() throws Exception {
         jobStatus.setState(org.apache.flink.api.common.JobStatus.RUNNING.name());
         ReconciliationUtils.updateStatusForDeployedSpec(deployment, new Configuration());
 
+        deployment.getSpec().setFlinkVersion(flinkVersion);
         flinkService.cancelJob(
                 deployment, UpgradeMode.SAVEPOINT, configManager.getObserveConfig(deployment));
         assertTrue(stopWithSavepointFuture.isDone());
         assertEquals(jobID, stopWithSavepointFuture.get().f0);
         assertFalse(stopWithSavepointFuture.get().f1);
         assertEquals(savepointPath, stopWithSavepointFuture.get().f2);
         assertEquals(savepointPath, jobStatus.getSavepointInfo().getLastSavepoint().getLocation());
+
+        if (flinkVersion.isNewerVersionThan(FlinkVersion.v1_14)) {
+            assertEquals(
+                    jobStatus.getState(), org.apache.flink.api.common.JobStatus.FINISHED.name());
+            assertEquals(
+                    deployment.getStatus().getJobManagerDeploymentStatus(),
+                    JobManagerDeploymentStatus.READY);
+        } else {
+            assertEquals(
+                    jobStatus.getState(), org.apache.flink.api.common.JobStatus.FINISHED.name());

Review Comment:
   the state assertion could be moved before the if



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java:
##########
@@ -326,6 +326,7 @@ protected void cancelJob(
                                 exception);
                     }
                     if (deleteClusterAfterSavepoint) {
+                        LOG.info("Cleaning up deployment after savepoint");

Review Comment:
   Maybe we should be more specific as in `stop-with-savepoint` or `savepoing-shutdown`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #440: [FLINK-30004] Cleanup deployment after savepoint for Flink versions < 1.15

Posted by GitBox <gi...@apache.org>.
gyfora commented on PR #440:
URL: https://github.com/apache/flink-kubernetes-operator/pull/440#issuecomment-1315308285

   > @gyfora fyi image build fails due to error connecting to `[maven repo](https://repo.maven.apache.org/maven2)`:
   > 
   > ```
   > #14 134.8 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-clean-plugin:3.1.0:clean (default-clean) on project flink-kubernetes-operator-parent: Execution default-clean of goal org.apache.maven.plugins:maven-clean-plugin:3.1.0:clean failed: Plugin org.apache.maven.plugins:maven-clean-plugin:3.1.0 or one of its dependencies could not be resolved: Could not transfer artifact org.apache.maven:maven-artifact:jar:3.0 from/to central (https://repo.maven.apache.org/maven2): transfer failed for https://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/3.0/maven-artifact-3.0.jar: Connect to repo.maven.apache.org:443 [repo.maven.apache.org/146.75.28.215] failed: Connection timed out (Connection timed out) -> [Help 1]
   > ```
   
   Yea this seems to break almost all CI runs the last 1-2 days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] tweise commented on pull request #440: [FLINK-30004] Cleanup deployment after savepoint for Flink versions < 1.15

Posted by GitBox <gi...@apache.org>.
tweise commented on PR #440:
URL: https://github.com/apache/flink-kubernetes-operator/pull/440#issuecomment-1315292832

   @gyfora fyi image build fails due to error connecting to `[maven repo](https://repo.maven.apache.org/maven2)`:
   ```
   #14 134.8 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-clean-plugin:3.1.0:clean (default-clean) on project flink-kubernetes-operator-parent: Execution default-clean of goal org.apache.maven.plugins:maven-clean-plugin:3.1.0:clean failed: Plugin org.apache.maven.plugins:maven-clean-plugin:3.1.0 or one of its dependencies could not be resolved: Could not transfer artifact org.apache.maven:maven-artifact:jar:3.0 from/to central (https://repo.maven.apache.org/maven2): transfer failed for https://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/3.0/maven-artifact-3.0.jar: Connect to repo.maven.apache.org:443 [repo.maven.apache.org/146.75.28.215] failed: Connection timed out (Connection timed out) -> [Help 1]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] tweise merged pull request #440: [FLINK-30004] Cleanup deployment after savepoint for Flink versions < 1.15

Posted by GitBox <gi...@apache.org>.
tweise merged PR #440:
URL: https://github.com/apache/flink-kubernetes-operator/pull/440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org