You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2013/05/15 20:31:18 UTC

[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

    [ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658637#comment-13658637 ] 

Alejandro Abdelnur commented on YARN-624:
-----------------------------------------

As pointed out, supporting gang at RM/scheduler level will allow detection/avoidance of deadlocks. This would not be trivial (nor efficient) to do if gang is done at AM level.

Examples of gang request capabilities could be:

* express a set of containers in any nodes. I.e.: 10 containers in any node of the cluster.
* express a set of containers in a specified set of nodes. I.e.: 10 containers in rack1. 10 containers one in each of n1...n10
* express different sets of possible gangs that would satisfy the request: I.e.: 10 containers in rack1 or in rack2. 10 containers in n1...n10 or in n11..n20.
* indicate a timeout/fallback-to-normal of gang requests.

We should decide on what gang capabilities we want/need to address in the short term.

                
> Support gang scheduling in the AM RM protocol
> ---------------------------------------------
>
>                 Key: YARN-624
>                 URL: https://issues.apache.org/jira/browse/YARN-624
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, scheduler
>    Affects Versions: 2.0.4-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they get all the ones they need.  However, this lends itself to deadlocks when different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira