You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Yang Wang (Jira)" <ji...@apache.org> on 2022/05/19 13:24:00 UTC

[jira] [Closed] (FLINK-27675) Improve manual savepoint tracking

     [ https://issues.apache.org/jira/browse/FLINK-27675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yang Wang closed FLINK-27675.
-----------------------------
    Resolution: Fixed

Fixed via:
main: 8c21487104ac4073d127f4c3b1b1591279f2318d
release-1.0: 851b9059055c849bd4fd2d0c4293b6774a911354

> Improve manual savepoint tracking
> ---------------------------------
>
>                 Key: FLINK-27675
>                 URL: https://issues.apache.org/jira/browse/FLINK-27675
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Assignee: Matyas Orhidi
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: kubernetes-operator-1.0.0
>
>
> There are 2 problems with the manual savpeoint result observing logic that can cause the reconciler to not make progress with the deployment (recoveries, upgrades etc).
>  # Whenever the jobmanager deployment is not in READY state or the job itself is not RUNNING, the trigger info must be reset and we should not try to query it anymore. Flink will not retry the savepoint if the job fails, restarted anyways.
>  # If there is a sensible error when fetching the savepoint status (such as: 
> There is no savepoint operation with triggerId=xxx for job ) we should simply reset the trigger. These errors will never go away on their own and will simply cause the deployment to get stuck in observing/waiting for a savepoint to complete



--
This message was sent by Atlassian Jira
(v8.20.7#820007)