You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wangda Tan (JIRA)" <ji...@apache.org> on 2016/10/05 00:36:22 UTC

[jira] [Updated] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler

     [ https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wangda Tan updated YARN-5139:
-----------------------------
    Attachment: YARN-5139-Concurrent-scheduling-performance-report.pdf

We're glad to share that our recently performance tests about concurrent scheduling based on the new global scheduling framework.

We use SLS (scheduler load simulator) simulated 2.4PB memory cluster, which has 20K nodes, and 4k-12k applications running in parallel.

The new concurrent scheduling can get up to *6.25X* throughput (avg #containers allocated per second) comparing to original async scheduling of Capacity Scheduler.

More details please refer to attached {{YARN-5139-Concurrent-scheduling-performance-report.pdf}} and code is {{wip-5.YARN-5139.patch}}.

Thanks [~vinodkv]/[~gtCarrera9] for lots of valuable offline suggestions.

+ People who might be interesting for this: [~jlowe]/[~curino]/[~kasha]/[~asuresh]/[~kkaranasos]/[~subru].

> [Umbrella] Move YARN scheduler towards global scheduler
> -------------------------------------------------------
>
>                 Key: YARN-5139
>                 URL: https://issues.apache.org/jira/browse/YARN-5139
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: Explanantions of Global Scheduling (YARN-5139) Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, wip-3.YARN-5139.patch, wip-4.YARN-5139.patch
>
>
> Existing YARN scheduler is based on node heartbeat. This can lead to sub-optimal decisions because scheduler can only look at one node at the time when scheduling resources.
> Pseudo code of existing scheduling logic looks like:
> {code}
> for node in allNodes:
>    Go to parentQueue
>       Go to leafQueue
>         for application in leafQueue.applications:
>            for resource-request in application.resource-requests
>               try to schedule on node
> {code}
> Considering future complex resource placement requirements, such as node constraints (give me "a && b || c") or anti-affinity (do not allocate HBase regionsevers and Storm workers on the same host), we may need to consider moving YARN scheduler towards global scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org