You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Eric Wohlstadter (JIRA)" <ji...@apache.org> on 2017/12/11 18:59:01 UTC
[jira] [Commented] (TEZ-394) Better scheduling for uneven DAGs
[ https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286411#comment-16286411 ]
Eric Wohlstadter commented on TEZ-394:
--------------------------------------
[~jlowe]
I'm jumping into this conversation late, so I might not have all the context.
I'm not sure this policy is what we want:
* Use the max distance to root across all of a vertices children
It doesn't seem general to me. Do we want the max distance to root for any of a vertices descendants?
If we extend the example to:
V1->V3->V4->V5->V6->V7
V2->V8->V7
Under "max distance to root across all of a vertices children", V2 will be scheduled early, which may end up leaving V8 holding on to resources for along time. Essentially the scheduling for a vertex needs to be relative to the complexity of the rest of the DAG, not the local context of its children.
I believe "max distance to root for any of a vertices descendants" is equal to "furthest distance from leaf".
So I'm thinking we might want to use [~bikassaha] original suggestion, but then special case the disconnected sub-graphs.
Does that make sense?
> Better scheduling for uneven DAGs
> ---------------------------------
>
> Key: TEZ-394
> URL: https://issues.apache.org/jira/browse/TEZ-394
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Rohini Palaniswamy
> Assignee: Jason Lowe
> Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch
>
>
> Consider a series of joins or group by on dataset A with few datasets that takes 10 hours followed by a final join with a dataset X. The vertex that loads dataset X will be one of the top vertexes and initialized early even though its output is not consumed till the end after 10 hours.
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases where the nodes which executed the MapTask might have gone down when the final join happens.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)