You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/10/24 07:49:59 UTC

[GitHub] [flink-kubernetes-operator] pvary commented on a diff in pull request #407: [FLINK-29713] Kubernetes operator should restart failed jobs

pvary commented on code in PR #407:
URL: https://github.com/apache/flink-kubernetes-operator/pull/407#discussion_r1002975203


##########
docs/content/docs/custom-resource/job-management.md:
##########
@@ -241,6 +241,21 @@ In order this feature to work one must enable [recovery of missing job deploymen
 At the moment deployment is considered unhealthy when Flink's restarts count reaches `kubernetes.operator.cluster.health-check.restarts.threshold` (default: `64`)
 within time window of `kubernetes.operator.cluster.health-check.restarts.window` (default: 2 minutes).
 
+## Restart failed job deployments
+
+The operator can restart a failed Flink cluster deployment. This could be useful in cases when the job main task is
+able to reconfigure the job to handle these failures.
+
+For example a job could dynamically create the DAG based on some job configuration which job configuration could
+change over time. When a task detects a record which could not be handled with the current configuration then the task
+should throw a `SuppressRestartsException` to fail the job. If `kubernetes.operator.cluster.restart.failed` is set to 
+`true` (default: `false`) then the operator detects the failed job and restarts it. When the job restarts then it reads
+the new job configuration and creates the new DAG based on this new configuration. The new deployment could handle the
+incoming records and no manual intervention is needed.

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org