You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "gyfora (via GitHub)" <gi...@apache.org> on 2023/03/25 21:43:28 UTC

[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #540: [FLINK-29566] Reschedule the cleanup logic if cancel job failed

gyfora commented on code in PR #540:
URL: https://github.com/apache/flink-kubernetes-operator/pull/540#discussion_r1148437562


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/sessionjob/SessionJobReconciler.java:
##########
@@ -100,6 +106,26 @@ public DeleteControl cleanupInternal(FlinkResourceContext<FlinkSessionJob> ctx)
             if (jobID != null) {
                 try {
                     cancelJob(ctx, UpgradeMode.STATELESS);
+                } catch (ExecutionException e) {
+                    final var cause = e.getCause();
+
+                    if (cause instanceof FlinkJobNotFoundException) {
+                        LOG.error("Job {} not found in the Flink cluster.", jobID, e);
+                        return DeleteControl.defaultDelete();
+                    }
+
+                    if (cause instanceof FlinkJobTerminatedWithoutCancellationException) {
+                        LOG.error("Job {} already terminated without cancellation.", jobID, e);
+                        return DeleteControl.defaultDelete();
+                    }
+
+                    final var delay = 10_000L;

Review Comment:
   We could also use the KubernetesOperatorConfigOptions.OPERATOR_OBSERVER_PROGRESS_CHECK_INTERVAL that we use for similar purposes. That also defaults to 10s



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org