You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2017/06/16 14:54:00 UTC
[jira] [Commented] (TEZ-394) Better scheduling for uneven DAGs

    [ https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051997#comment-16051997 ] 

Jason Lowe commented on TEZ-394:
--------------------------------

My apologies for not being clear.  I definitely could have worded this better.

As Rohini points out, distance from leaf has the unfortunate side-effect of penalizing priorities of disconnected vertices relative to a subgroup that has a long critical path.  In the latest example, we want V1 and V6 to have approximately the same priority since they are both unburdened root vertices without connections to a "deep" vertex in the DAG.  However V6 is very close to its leaf.  Therefore V6 ends up with a very different priority than V1 since V1 is quite far from the leaf node in its subgroup.

The new algorithm isn't pure distance-to-root or distance-to-leaf.  Instead it is recalculating the depth of a vertex to be the maximum depth of all parent vertices of the vertex's children.  So V1 and V6 end up being at the same depth because they are root vertices and their children have no other parents.  However V2 is a root vertex that ends up getting a recalculated depth much lower in the tree since its child, V5, has another parent that is deeper in the tree than V2.  This lowers the priority of V2, which was the original intent of the JIRA, but we avoid penalizing V6 as pure distance-to-leaf would do.


> Better scheduling for uneven DAGs
> ---------------------------------
>
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jason Lowe
>         Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch
>
>
>   Consider a series of joins or group by on dataset A with few datasets that takes 10 hours followed by a final join with a dataset X. The vertex that loads dataset X will be one of the top vertexes and initialized early even though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases where the nodes which executed the MapTask might have gone down when the final join happens. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)