You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2017/07/19 23:32:00 UTC

[jira] [Commented] (TEZ-3770) DAG-aware YARN task scheduler

    [ https://issues.apache.org/jira/browse/TEZ-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093956#comment-16093956 ] 

Siddharth Seth commented on TEZ-3770:
-------------------------------------

bq. It tries to schedule new containers for tasks that match its priority before trying to schedule the highest priority task first. This avoids hanging onto unused, lower priority containers because higher priority requests are pending (see TEZ-3535).
If I'm reading the code right. New containers which cannot be assigned immediately are released? Pending requests are removed as soon as a container is assigned. YARN will not end up allocating this unused container again (other than the regular timing races on the protocol).

bq. New task allocation requests are first matched against idle containers before requesting resources from the RM. This cuts down on AM-RM protocol churn.
Not sure if priority is being considered while doing this. i.e. is it possible there's a pending higher priority request which has not yet been allocated to an idle container (primarily races in timing)? Think this is handled since an attempt is made to allocate a container the moment the task assigned to it is de-allocated.

bq. Task requests for tasks that are DAG-descendants of pending task requests will not be allocated to help reduce priority inversions that could lead to preemption.
This is broken for newly assigned containers?

On the patch itself.
DagAwareYarnTaskScheduler
- TaskRequest oldRequest = requests.put -> Is it possible for old to not be null? A single request to allocate a single attempt.
- incrVertexTaskCount - lowerStat.allowedVertices.andNot(d); <- Would be nice to have some more documentation or an example of how this ends up working. Does it rely on the way priorities are assigned?, the kind of topological sort? When reading this, it seems to block off a large chunk of requests at a lower priority.
- Different code paths for the allocation of a delayed container and when a new task request comes in. Assuming this is a result of attempting to not place a YARN request if a container can be assigned immediately? Not sure if more re-use is possible across the various assign methods.
- RequestPriorityStats - javadoc on descendants is a little confusing. Mentions a single vertex. I think this gets set for every vertex at the same priority level. The default out of box behaviour will always generate different vertices at different priority levels at the moment. The old behaviour was to generate the same priority if distance from root was the same. Is moving back to the old behaviour an option - given descendent information is now known).
- Didn't go into enough details to figure out if an attempt is made to run through an entire tree before moving over to an unrelated tree
- In tryAssignReuseContainer - if a container cannot be assigned immediately, will it be released? Should this decision be based on headroom / pending requests (headroom is very often incorrect, preemption is meant to take care of that). e.g. a task failure, so there's a new request. If the container cannot be re-used for this request, and capacity is available in YARN - it may make sense to hold on to the container.

DagInfo - Should getVertexDescendants be exposed as a method, or just the Vertices and the relationship between them. Whoever wants to use this can set up their own representation. The bit representation could be a helper. The vertex relationship can likely be used for more than just the list of descendants.
TaskSchedulerContext - Instead of exposing a getVertexIndexForTask(Object) - I think a better option is to provide an interface for the requesting task itself. (TaskRequest instead of Object). That can expose relevant information, instead of making an additional call to get this from TaskSchedulerContext. 


> DAG-aware YARN task scheduler
> -----------------------------
>
>                 Key: TEZ-3770
>                 URL: https://issues.apache.org/jira/browse/TEZ-3770
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: TEZ-3770.001.patch
>
>
> There are cases where priority alone does not convey the relationship between tasks, and this can cause problems when scheduling or preempting tasks.  If the YARN task scheduler was aware of the relationship between tasks then it could make smarter decisions when trying to assign tasks to containers or preempt running tasks to schedule pending tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)