You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Aadithya (Jira)" <ji...@apache.org> on 2020/11/25 12:23:00 UTC
[jira] [Commented] (YARN-9449) Non-exclusive labels can create reservation loop on cluster without unlabeled node

    [ https://issues.apache.org/jira/browse/YARN-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238718#comment-17238718 ] 

Aadithya commented on YARN-9449:
--------------------------------

Hi,

Can any one provide me the solution or work around for this issue.This is frequently occurring in EMR clusters where node labels are enabled.

I appreciate any help provided. 

> Non-exclusive labels can create reservation loop on cluster without unlabeled node
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-9449
>                 URL: https://issues.apache.org/jira/browse/YARN-9449
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.5
>            Reporter: Brandon Scheller
>            Priority: Major
>
> https://issues.apache.org/jira/browse/YARN-5342 Added a counter to Yarn so that unscheduled resource requests were attempted to be scheduled on unlabeled nodes first.
>  This counter is reset only when an attempt to schedule happens on an unlabeled node.
> On hadoop clusters with only labeled nodes, this counter can never be reset and therefore it will block skipping that node.
>  Because the node will not be skipped, it creates the loop shown below in the Yarn RM logs.
> This can block scheduling of a spark executor for example and cause the spark application to get stuck.
>  
> {{_2019-02-18 23:54:22,591 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1550533628872_0003_01_000023 Container Transitioned from NEW to RESERVED 2019-02-18 23:54:22,591 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator (ResourceManager Event Processor): Reserved container application=application_1550533628872_0003 resource=<memory:11264, vCores:1> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3 cluster=<memory:24576, vCores:16> 2019-02-18 23:54:22,592 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue (ResourceManager Event Processor): assignedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,592 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Trying to fulfill reservation for application application_1550533628872_0003 on node: ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:23,592 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp (ResourceManager Event Processor): Application application_1550533628872_0003 unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1 available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has 0 at priority 1; currentReservation <memory:0, vCores:0> on node-label=LABELED 2019-02-18 23:54:23,593 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1550533628872_0003_01_000024 Container Transitioned from NEW to RESERVED 2019-02-18 23:54:23,593 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator (ResourceManager Event Processor): Reserved container application=application_1550533628872_0003 resource=<memory:11264, vCores:1> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3 cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,593 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue (ResourceManager Event Processor): assignedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,593 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Trying to fulfill reservation for application application_1550533628872_0003 on node: ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:24,593 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp (ResourceManager Event Processor): Application application_1550533628872_0003 unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1 available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has 0 at priority 1; currentReservation <memory:0, vCores:0> on node-label=LABELED 2019-02-18 23:54:24,594 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1550533628872_0003_01_000025 Container Transitioned from NEW to RESERVED 2019-02-18 23:54:24,594 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator (ResourceManager Event Processor): Reserved container application=application_1550533628872_0003 resource=<memory:11264, vCores:1> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3 cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,594 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue (ResourceManager Event Processor): assignedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:24576, vCores:16> 2019-02-18 23:54:25,594 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Trying to fulfill reservation for application application_1550533628872_0003 on node: ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:25,595 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp (ResourceManager Event Processor): Application application_1550533628872_0003 unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1 available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has 0 at priority 1; currentReservation <memory:0, vCores:0> on node-label=LABELED_}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org