You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2022/05/27 13:12:00 UTC

[jira] [Created] (FLINK-27820) Handle Upgrade/Deployment errors gracefully

Gyula Fora created FLINK-27820:
----------------------------------

             Summary: Handle Upgrade/Deployment errors gracefully
                 Key: FLINK-27820
                 URL: https://issues.apache.org/jira/browse/FLINK-27820
             Project: Flink
          Issue Type: Improvement
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.0.0
            Reporter: Gyula Fora
            Assignee: Gyula Fora
             Fix For: kubernetes-operator-1.1.0


The operator currently cannot gracefully handle the cases when there is a failure during (or directly after & and before updating the status) job submission.

This applies to both initial cluster submissions when a Flink CR was created but more importantly during upgrades.

This is slightly related to https://issues.apache.org/jira/browse/FLINK-27804 where mid-upgrade observe was disabled to workaround some issues, this logic should also be improved to only skip observing last-state info for already finished jobs (that were observed before).

During upgrades, the observer should be able to recognize when the job/cluster was actually submitted already even if the status update subsequently failed and move the status into a healthy DEPLOYED state.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)