You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Arun Suresh (JIRA)" <ji...@apache.org> on 2015/06/30 23:35:06 UTC

[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues

    [ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609119#comment-14609119 ] 

Arun Suresh commented on YARN-3633:
-----------------------------------

[~ragarwal], Was just wondering.. wrt the scenario you mentioned in [here|https://issues.apache.org/jira/browse/YARN-3633?focusedCommentId=14542895&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542895]. Isnt it possible that AM4 can remain unscheduled (starved) until AM1/AM2 or AM3 completes ? Basically containers started by AM1/2 and 3 might start and end, but until an application itself completes, AM4 will not be scheduled.. right ?

> With Fair Scheduler, cluster can logjam when there are too many queues
> ----------------------------------------------------------------------
>
>                 Key: YARN-3633
>                 URL: https://issues.apache.org/jira/browse/YARN-3633
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Rohit Agarwal
>            Assignee: Rohit Agarwal
>            Priority: Critical
>         Attachments: YARN-3633-1.patch, YARN-3633.patch
>
>
> It's possible to logjam a cluster by submitting many applications at once in different queues.
> For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users submit applications at the same time. The fair share of each queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)