You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "yunfan (Jira)" <ji...@apache.org> on 2023/02/14 13:36:00 UTC

[jira] [Created] (FLINK-31065) Support more split assigner strategy for batch job

yunfan created FLINK-31065:
------------------------------

             Summary: Support more split assigner strategy for batch job
                 Key: FLINK-31065
                 URL: https://issues.apache.org/jira/browse/FLINK-31065
             Project: Flink
          Issue Type: Improvement
            Reporter: yunfan


Currently flink use LocatableInputSplitAssigner as the default split assigner. 

Which splits the task will consume are dynamic in the runtime.

It is not a good strategy in the batch mode.

For example, we have 100 splits and the job has 100 tasks. 

When the job start, we don't have enough resource to start the 100 tasks, only 10 tasks started in the first time.( it is a common case in batch mode)

These 10 tasks will consume all splits, and other 90 tasks will reach finish state.

This is obviously not a good idea in batch mode. 

In extreme cases, 99 tasks finished, only one task running, and if this running task failed,

it will take much time to rerun this task( Because it need to consume it's 10 split again).

Spark will bind splits to it's task, I think it is a better way in the batch mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)