You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2015/12/18 18:31:46 UTC
[jira] [Created] (FLINK-3187) Decouple restart strategy from
ExecutionGraph
Till Rohrmann created FLINK-3187:
------------------------------------
Summary: Decouple restart strategy from ExecutionGraph
Key: FLINK-3187
URL: https://issues.apache.org/jira/browse/FLINK-3187
Project: Flink
Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Minor
Currently, the {{ExecutionGraph}} supports the following restart logic: Whenever a failure occurs and the number of restart attempts aren't depleted, wait for a fixed amount of time and then try to restart. This behaviour can be controlled by the configuration parameters {{execution-retries.default}} and {{execution-retries.delay}}.
I propose to decouple the restart logic from the {{ExecutionGraph}} a bit by introducing a strategy pattern. That way it would not only allow us to define a job specific restart behaviour but also to implement different restart strategies. Conceivable strategies could be: Fixed timeout restart, exponential backoff restart, partial topology restarts, etc.
This change is a preliminary step towards having a restart strategy which will scale the parallelism of a job down in case that not enough slots are available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)