You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/06/18 17:12:25 UTC

[jira] [Commented] (YARN-2176) CapacityScheduler loops over all running applications rather than actively requesting apps

    [ https://issues.apache.org/jira/browse/YARN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035799#comment-14035799 ] 

Jason Lowe commented on YARN-2176:
----------------------------------

AppSchedulingInfo is already determining when an app is actively requesting to be able to update the QueueMetrics.activeApplications metric.  (It's confusing that LeafQueue also has an activeApplications collection which is actually the applications running not just the ones requesting.)

It would be nice to leverage the work already being done by AppSchedulingInfo, which is currently calling the ActiveUsersManager activateApplication and deactivateApplication methods when necessary.  CapacityScheduler could potentially have a derived ActiveUsersManager class that in addition notifies the LeafQueue so the queue can track apps requesting and apps not requesting separately.  To preserve allocation semantics we'd have to track the original order of the applications so activating an application inserts it into the list of requesting applications in the same relative order to other requesting applications regardless of how many times it's been activated or deactivated.

> CapacityScheduler loops over all running applications rather than actively requesting apps
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-2176
>                 URL: https://issues.apache.org/jira/browse/YARN-2176
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>
> The capacity scheduler performance is primarily dominated by LeafQueue.assignContainers, and that currently loops over all applications that are running in the queue.  It would be more efficient if we looped over just the applications that are actively asking for resources rather than all applications, as there could be thousands of applications running but only a few hundred that are currently asking for resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)