You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/02/02 12:29:39 UTC

[jira] [Commented] (FLINK-3187) Decouple restart strategy from ExecutionGraph

    [ https://issues.apache.org/jira/browse/FLINK-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128092#comment-15128092 ] 

ASF GitHub Bot commented on FLINK-3187:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1470#discussion_r51556295
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java ---
    @@ -237,53 +236,26 @@ public ExecutionConfig setParallelism(int parallelism) {
     	}
     
     	/**
    -	 * Gets the number of times the system will try to re-execute failed tasks. A value
    -	 * of {@code -1} indicates that the system default value (as defined in the configuration)
    -	 * should be used.
    +	 * Sets the restart strategy configuration which defines which restart strategy shall be used
    +	 * for the execution graph of the corresponding job.
    --- End diff --
    
    Agreed. Good point. I've simplified the the description and added a code example.


> Decouple restart strategy from ExecutionGraph
> ---------------------------------------------
>
>                 Key: FLINK-3187
>                 URL: https://issues.apache.org/jira/browse/FLINK-3187
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> Currently, the {{ExecutionGraph}} supports the following restart logic: Whenever a failure occurs and the number of restart attempts aren't depleted, wait for a fixed amount of time and then try to restart. This behaviour can be controlled by the configuration parameters {{execution-retries.default}} and {{execution-retries.delay}}.
> I propose to decouple the restart logic from the {{ExecutionGraph}} a bit by introducing a strategy pattern. That way it would not only allow us to define a job specific restart behaviour but also to implement different restart strategies. Conceivable strategies could be: Fixed timeout restart, exponential backoff restart, partial topology restarts, etc.
> This change is a preliminary step towards having a restart strategy which will scale the parallelism of a job down in case that not enough slots are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)