You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Anubhav Dhoot (JIRA)" <ji...@apache.org> on 2014/12/10 20:25:16 UTC

[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

    [ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241570#comment-14241570 ] 

Anubhav Dhoot commented on YARN-2877:
-------------------------------------

+1 for notion of distributed scheduling. I think it will go a long way for addressing both latency and scale goals for YARN.

In my experience with using similar distributed scheduling systems we can run into following types of issues
a) the node is currently full of running containers and the estimate of when capacity will free up for running queued requests could be hard/wrong. Your request might be queued a long time affecting latency of the queue-able container startup
b) multiple LocalRMs could race to grab available space on a NM and one might get queued behind other requests having similar effects as a).

For sake of discussion of mechanisms, I would suggest discussion of pros and cons for ability to 1) schedule queueable containers on multiple nodes, 2) ability to cancel  queued requests
Giving the power of at least 2 NM choices could address a lot of variability of queue-able container startup latency.
One way is keep the queue of requests in the NM, but if needed, NMs ultimately confirm with the requesting LocalRM to ensure that the queued request is still valid. 

> Extend YARN to support distributed scheduling
> ---------------------------------------------
>
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling.  Briefly, some of the motivations for distributed scheduling are the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)