You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/09/16 19:28:20 UTC

[jira] [Comment Edited] (TEZ-394) Better scheduling for uneven DAGs

    [ https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497160#comment-15497160 ] 

Rohini Palaniswamy edited comment on TEZ-394 at 9/16/16 7:27 PM:
-----------------------------------------------------------------

In case of a dag -  RootV1->IntermediateV1->IntermediateV2-LeafV1 RootV2->LeafV1  (Root vertices are those reading input from hdfs or other sources)

The scheduling priority currently is
1) RootV1, RootV2 (no particular order)
2) IntermediateV1
3) IntermediateV2
4) LeafV1

It should be
1) RootV1 
2) IntermediateV1
3)  RootV2 , IntermediateV2 (Root vertex should be given priority over intermediate when there is no shuffle dependency. In cases where Root vertex also has shuffle dependencies order does not matter)
4) LeafV1

This should also partially help with  TEZ-3274 cases where the root input vertex also takes a shuffle input. The root vertex tasks will be scheduled lot later.  TEZ-3274 is still required for slow start of tasks based on the shuffle input completion.




was (Author: rohini):
In case of a dag -  RootV1->IntermediateV1->IntermediateV2-LeafV1 RootV2->LeafV1  (Root vertices are those reading input from hdfs or other sources)

The scheduling priority currently is
1) RootV1, RootV2 (no particular order)
2) IntermediateV1
3) IntermediateV2
4) LeafV1

It should be
1) RootV1, RootV2 
2) IntermediateV1
3)  RootV2 , IntermediateV2 (Root vertex should be given priority over intermediate when there is no shuffle dependency. In cases where Root vertex also has shuffle dependencies order does not matter)
4) LeafV1

This should also partially help with  TEZ-3274 cases where the root input vertex also takes a shuffle input. The root vertex tasks will be scheduled lot later.  TEZ-3274 is still required for slow start of tasks based on the shuffle input completion.



> Better scheduling for uneven DAGs
> ---------------------------------
>
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>
>   Consider a series of joins or group by on dataset A with few datasets that takes 10 hours followed by a final join with a dataset X. The vertex that loads dataset X will be one of the top vertexes and initialized early even though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases where the nodes which executed the MapTask might have gone down when the final join happens. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)