You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/10/31 16:02:03 UTC

[GitHub] [flink-kubernetes-operator] clarax opened a new pull request, #421: [FLINK-29819] Record an error event when savepoint fails within grace…

clarax opened a new pull request, #421:
URL: https://github.com/apache/flink-kubernetes-operator/pull/421

   … period
   
   
   ## What is the purpose of the change
   
   As of now, SavepointObserver retries if savepoint fails within grace period until success or failure happens after the grace period. The grace period is for each retry.  If underlying problem for quick failure is not transient, such as a mis-configured path or a perisistent storage failure, retries keep going on without recording any error event. 
   We should first add logic to record an error event per failed attempt. We can consider capping the retries if it become a pain for users.
   
   
   ## Brief change log
   
   
     - Recorded an error event when savepoint fails within grace period
   
   
   ## Verifying this change
   
   
   This change added tests and can be verified as follows:
   
   
     - Updated test cases in ApplicationObserverTest for error event counts.
     - Also verified on Minikube by deploying the new operator and a job with savepoint dir set to wrong path.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): No
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: No
     - Core observer or reconciler logic that is regularly executed: No
   
   ## Documentation
   
     - Does this pull request introduce a new feature? No
     - If yes, how is the feature documented? No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora merged pull request #421: [FLINK-29819] Record an error event when savepoint fails within grace…

Posted by GitBox <gi...@apache.org>.
gyfora merged PR #421:
URL: https://github.com/apache/flink-kubernetes-operator/pull/421


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org