You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2022/05/27 13:12:00 UTC
[jira] [Created] (FLINK-27820) Handle Upgrade/Deployment errors gracefully
Gyula Fora created FLINK-27820:
----------------------------------
Summary: Handle Upgrade/Deployment errors gracefully
Key: FLINK-27820
URL: https://issues.apache.org/jira/browse/FLINK-27820
Project: Flink
Issue Type: Improvement
Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora
Assignee: Gyula Fora
Fix For: kubernetes-operator-1.1.0
The operator currently cannot gracefully handle the cases when there is a failure during (or directly after & and before updating the status) job submission.
This applies to both initial cluster submissions when a Flink CR was created but more importantly during upgrades.
This is slightly related to https://issues.apache.org/jira/browse/FLINK-27804 where mid-upgrade observe was disabled to workaround some issues, this logic should also be improved to only skip observing last-state info for already finished jobs (that were observed before).
During upgrades, the observer should be able to recognize when the job/cluster was actually submitted already even if the status update subsequently failed and move the status into a healthy DEPLOYED state.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)