You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zhijie Shen (JIRA)" <ji...@apache.org> on 2013/03/28 23:03:15 UTC

[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

    [ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616708#comment-13616708 ] 

Zhijie Shen commented on YARN-276:
----------------------------------

IMO, the essential problem is that maxActiveApplications is a loose bound. See the formular bellow.

1. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications.

maxActiveApplications is computed by assuming each application only requires minAllocation. In fact, AM container may require more. Therefore,

2. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications = (minAllocation_1 + minAllocation_2 + ... + minAllocation_k) <= (requestedResource_1 + requestedResource_2 + ... + minAllocation_k), where k = maxActiveApplications.

Hence when maxActiveApplications applications are activated and they require more than minAllocation resource, such that more than maximumApplicationMasterResourcePercent of clusterResource may be used by AMs, and even clusterResource is likely to be exceeded.

@nemon's solution looks good, which is actually a more restrict bound of the max allowed active applications. Whenever an application is to be activated, the following criteria is checked.

3. clusterResource * maximumApplicationMasterResourcePercent - ApplicationMasterResource >= requestedResource.

The issue here is that when this criteria is met, maxActiveApplications should be met as well, because this one is more restricted. So instead of add the new criteria, how about replacing maxActiveApplications with it?
                
> Capacity Scheduler can hang when submit many jobs concurrently
> --------------------------------------------------------------
>
>                 Key: YARN-276
>                 URL: https://issues.apache.org/jira/browse/YARN-276
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.0.0, 2.0.1-alpha
>            Reporter: nemon lou
>         Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there.
> The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira