You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2022/10/31 17:28:00 UTC

[jira] [Assigned] (FLINK-29819) Record an error event when savepoint fails within grace period

     [ https://issues.apache.org/jira/browse/FLINK-29819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gyula Fora reassigned FLINK-29819:
----------------------------------

    Assignee: Clara Xiong

> Record an error event when savepoint fails within grace period
> --------------------------------------------------------------
>
>                 Key: FLINK-29819
>                 URL: https://issues.apache.org/jira/browse/FLINK-29819
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>              Labels: pull-request-available
>
> As of now, SavepointObserver retries if savepoint fails within grace period until success or failure happens after the grace period. The grace period is for each retry.  If underlying problem for quick failure is not transient, such as a mis-configured path or a perisistent storage failure, retries keep going on without recording any error event. 
> We should first add logic to record an error event per failed attempt. We can consider capping the retries if it become a pain for users.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)