You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zhijie Shen (JIRA)" <ji...@apache.org> on 2013/02/08 09:37:12 UTC

[jira] [Created] (YARN-389) Infinitely assigning containers when the required resource exceeds the cluster's absolute capacity

Zhijie Shen created YARN-389:
--------------------------------

             Summary: Infinitely assigning containers when the required resource exceeds the cluster's absolute capacity
                 Key: YARN-389
                 URL: https://issues.apache.org/jira/browse/YARN-389
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Zhijie Shen
            Assignee: Zhijie Shen


I've run wordcount example on branch-2 and trunk. I've set yarn.nodemanager.resource.memory-mb to 1G and yarn.app.mapreduce.am.resource.mb to 1.5G. Therefore, resourcemanager is to assign a 2G AM container for AM. However, the nodemanager doesn't have enough memory to assign the container. The problem is that the assignment operation will be repeated infinitely, if the assignment cannot be accomplished. See the following log.

{code}
2013-02-07 19:05:05,947 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 1
2013-02-07 19:05:06,477 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Storing Application with id application_1360292699925_0001
2013-02-07 19:05:06,479 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1360292699925_0001
2013-02-07 19:05:06,479 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 1 submitted by user zshen
2013-02-07 19:05:06,481 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=zshen	IP=127.0.0.1	OPERATION=Submit Application Request	TARGET=ClientRMService	RESULT=SUCCESS	APPID=application_1360292699925_0001
2013-02-07 19:05:06,493 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1360292699925_0001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,494 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1360292699925_0001_000001
2013-02-07 19:05:06,495 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1360292699925_0001_000001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application application_1360292699925_0001 from user: zshen activated in queue: default
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application added - appId: application_1360292699925_0001 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@4965d0e0, leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application added - appId: application_1360292699925_0001 user: zshen leaf-queue of parent: root #applications: 1
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1360292699925_0001_000001, user: zshen queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0, currently active: 1
2013-02-07 19:05:06,508 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1360292699925_0001_000001 State change from SUBMITTED to SCHEDULED
2013-02-07 19:05:06,509 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1360292699925_0001 State change from SUBMITTED to ACCEPTED
2013-02-07 19:05:07,163 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:08,164 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:09,167 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:10,168 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:11,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:12,173 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:13,175 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:14,177 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:15,179 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
...
2013-02-07 23:51:02,976 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:03,977 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:04,978 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:05,979 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:06,981 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:07,982 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:08,983 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity: 1.0)
...
{code}

In my opinion, the attempt of assigning containers should be terminated in the following two cases.
1. Required > Cluster's absolute capacity: the assignment is impossible to be accomplished. The assignment should be failed immediately.
2. Required + Already used > Cluster's absolute capacity: the assignment should be failed after a certain number of rounds of assignment attempt or a certain duration. The number of rounds or the duration length should be configurable.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira