You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Sihua Zhou (JIRA)" <ji...@apache.org> on 2017/07/13 11:27:00 UTC

[jira] [Commented] (FLINK-5747) Eager Scheduling should deploy all Tasks together

    [ https://issues.apache.org/jira/browse/FLINK-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085555#comment-16085555 ] 

Sihua Zhou commented on FLINK-5747:
-----------------------------------

Hi,[~StephanEwen], there's some problems i found with Eager Scheduling in flink 1.3.x. i will be preciate if you have time to review what i've posted in (FLINK-7153)[link title|https://issues.apache.org/jira/browse/FLINK-7153], i will close the issue if i was wrong.
Thanks.  
Sihua zhou

> Eager Scheduling should deploy all Tasks together
> -------------------------------------------------
>
>                 Key: FLINK-5747
>                 URL: https://issues.apache.org/jira/browse/FLINK-5747
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.2.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
>
>
> Currently, eager scheduling immediately triggers the scheduling for all vertices and their subtasks in topological order. 
> This has two problems:
>   - This works only, as long as resource acquisition is "synchronous". With dynamic resource acquisition in FLIP-6, the resources are returned as Futures which may complete out of order. This results in out-of-order (not in topological order) scheduling of tasks which does not work for streaming.
>   - Deploying some tasks that depend on other tasks before it is clear that the other tasks have resources as well leads to situations where many deploy/recovery cycles happen before enough resources are available to get the job running fully.
> For eager scheduling, we should allocate all resources in one chunk and then deploy once we know that all are available.
> As a follow-up, the same should be done per pipelined component in lazy batch scheduling as well. That way we get lazy scheduling across blocking boundaries, and bulk (gang) scheduling in pipelined subgroups.
> This also does not apply for efforts of fine grained recovery, where individual tasks request replacement resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)