You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Prabhu Joseph (Jira)" <ji...@apache.org> on 2020/06/01 12:25:00 UTC
[jira] [Commented] (YARN-10293) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)

    [ https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120982#comment-17120982 ] 

Prabhu Joseph commented on YARN-10293:
--------------------------------------

Thanks [~wangda] for reviewing.  

The older behavior of Allocate Container on Single Node skips scheduling on a node when it has reserved 
container or no available containers.
  
{code}
   if (calculator.computeAvailableContainers(Resources
            .add(node.getUnallocatedResource(), node.getTotalKillableResources()),
        minimumAllocation) <= 0) {
{code}
		
Multi Node Placement checks the used partition capacity which includes the reserved capacity. But there can be still nodes with available containers which is ignored. (as per JIRA description)
  
{code}
  if (getRootQueue().getQueueCapacities().getUsedCapacity(
      candidates.getPartition()) >= 1.0f
      && preemptionManager.getKillableResource(
{code}
	  
This condition can be removed, don't see any impact. [~Tao Yang] Can you confirm the same.

Other approaches are the one in patch. Or adding extra check of if available containers in any node part of candidates in addition to above checks.
  
	  
	  
  
		


> Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10293
>                 URL: https://issues.apache.org/jira/browse/YARN-10293
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.3.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: YARN-10293-001.patch, YARN-10293-002.patch
>
>
> Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues related to it https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987
> Have found one more bug in the CapacityScheduler.java code which causes the same issue with slight difference in the repro.
> *Repro:*
> *Nodes :       Available :         Used*
> Node1 -  8GB, 8vcores -  8GB. 8cores
> Node2 -  8GB, 8vcores - 8GB. 8cores
> Node3 -  8GB, 8vcores - 8GB. 8cores
> Queues -> A and B both 50% capacity, 100% max capacity
> MultiNode enabled + Preemption enabled
> 1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores
> 2. JobB Submitted to B queue with AM size of 1GB
> {code}
> 2020-05-21 12:12:27,313 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest  IP=172.27.160.139       OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1590046667304_0005    CALLERCONTEXT=CLI       QUEUENAME=dummy
> {code}
> 3. Preemption happens and used capacity is lesser than 1.0f
> {code}
> 2020-05-21 12:12:48,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics: Non-AM container preempted, current appAttemptId=appattempt_1590046667304_0004_000001, containerId=container_e09_1590046667304_0004_01_000024, resource=<memory:1024, vCores:1>
> {code}
> 4. JobB gets a Reserved Container as part of CapacityScheduler#allocateOrReserveNewContainer
> {code}
> 2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to RESERVED
> 2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1>
> {code}
> *Why RegularContainerAllocator reserved the container when the used capacity is <= 1.0f ?*
> {code}
> The reason is even though the container is preempted - nodemanager has to stop the container and heartbeat and update the available and unallocated resources to ResourceManager.
> {code}
> 5. Now, no new allocation happens and reserved container stays at reserved.
> After reservation the used capacity becomes 1.0f, below will be in a loop and no new allocate or reserve happens. The reserved container cannot be allocated as reserved node does not have space. node2 has space for 1GB, 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting called causing the Hang.
> *[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container on node*
> {code}
> 2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1590046667304_0005 on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
> 2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignContainers: partition= #applications=1
> 2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1>
> 2020-05-21 12:13:33,243 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted
> {code}
> CapacityScheduler#allocateOrReserveNewContainers won't be called as below check in allocateContainersOnMultiNodes fails
> {code}
>  if (getRootQueue().getQueueCapacities().getUsedCapacity(
>         candidates.getPartition()) >= 1.0f
>         && preemptionManager.getKillableResource(
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org