You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "BoWang (JIRA)" <ji...@apache.org> on 2019/04/30 09:49:00 UTC
[jira] [Comment Edited] (FLINK-12229) Implement Lazy Scheduling Strategy

    [ https://issues.apache.org/jira/browse/FLINK-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830129#comment-16830129 ] 

BoWang edited comment on FLINK-12229 at 4/30/19 9:48 AM:
---------------------------------------------------------

Hi, [~till.rohrmann] [~gjy] [~tiemsn]
 In the origin scheduler, the consumer vertex is scheduled when ANY/ALL the IntermediateDataSet is consumable, and IntermediateDataSet is consumable when all the result partitions are finished for BLOCKING ResultType. Shall we be consistent with this logic in the new scheduler?

Another question is that when I implemented Lazy strategy, I found that each time the producer vertex state change or partition consumable notification, all the input partitions of the vertex will be checked to decide whether it should be scheduled. With n producer vertices and n consumer vertices, the partitions would be checked O(n^2) times. I think it is very inefficient. If we add SchedulingIntermediateDataSet and react to vertex state change notification, relying on the counter of the SchedulingIntermediateDataSet, it needs only O(n ) partition check times (This is what I did in [GitHub Pull Request #8309|https://github.com/apache/flink/pull/8309]). Another option is to maintain some member variables in LazyFromSourcesSchedulingStrategy to do the same thing as SchedulingIntermediateDataSet.

What do you think?


was (Author: eaglewatcher):
Hi, [~till.rohrmann] [~gjy] [~tiemsn]
In the origin scheduler, the consumer vertex is scheduled when ANY/ALL the IntermediateDataSet is consumable, and IntermediateDataSet is consumable when all the result partitions are finished for BLOCKING ResultType. Shall we be consistent with this logic in the new scheduler?

Another question is that when I implemented Lazy strategy, I found that each time the producer vertex state change or partition consumable notification, all the input partitions of the vertex will be checked to decide whether it should be scheduled. With n producer vertices and n consumer vertices, the partitions would be checked O(n^2) times. I think it is very inefficient. If we add SchedulingIntermediateDataSet and react to vertex state change notification, relying on the counter of the SchedulingIntermediateDataSet, it needs only O(n) partition check times (This is what I did in [GitHub Pull Request #8309|https://github.com/apache/flink/pull/8309]). Another option is to maintain some member variables in LazyFromSourcesSchedulingStrategy to do the same thing as SchedulingIntermediateDataSet.

What do you think?

> Implement Lazy Scheduling Strategy
> ----------------------------------
>
>                 Key: FLINK-12229
>                 URL: https://issues.apache.org/jira/browse/FLINK-12229
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Gary Yao
>            Assignee: BoWang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement a {{SchedulingStrategy}} that covers the functionality of {{ScheduleMode.LAZY_FROM_SOURCES}}, i.e., vertices are scheduled when all the input data are available.
> Acceptance Criteria:
>  * New strategy is tested in isolation using test implementations (i.e., without having to submit a job)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)