You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Prabhu Joseph (JIRA)" <ji...@apache.org> on 2017/12/27 10:06:00 UTC
[jira] [Created] (YARN-7685) Preemption does not happen when a node label partition is fully utilized

Prabhu Joseph created YARN-7685:
-----------------------------------

             Summary: Preemption does not happen when a node label partition is fully utilized
                 Key: YARN-7685
                 URL: https://issues.apache.org/jira/browse/YARN-7685
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacity scheduler
    Affects Versions: 2.7.3
            Reporter: Prabhu Joseph
         Attachments: Screen Shot 2017-12-27 at 3.28.13 PM.png, Screen Shot 2017-12-27 at 3.28.20 PM.png, Screen Shot 2017-12-27 at 3.28.32 PM.png, Screen Shot 2017-12-27 at 3.31.42 PM.png, capacity-scheduler.xml

Have two queues default and tkgrid and two node labels default (exclusivity=true) and tkgrid (exclusivity=false)

default queue = capacity 15% and max capacity is 100% and default node label expression is tkgrid
tkgrid queue = capacity 85% and max capacity is 100% and default node label expression is default

When default queue has occupied the complete node label tkgrid and then a new job submitted into tkgrid queue with node label expression tkgrid will wait in ACCEPTED state forever as there is no space in tkgrid partition for the Application Master. Preemption does not kick in for this scenario.

Attached capacity-scheduler.xml, RM UI, Nodes and Node Labels screenshot.

{code}
Repro Steps:
[yarn@bigdata3 root]$ yarn cluster  --list-node-labels 
Node Labels: <tkgrid:exclusivity=false>

Job 1 submitted into default queue which has utilized complete tkgrid node label partition:

yarn jar /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar  -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args 2h -timeout 7200000 -jar /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar -queue default  -num_containers 20

Job 2 submitted into tkgrid queue with AM node label expression as tkgrid which stays at ACCEPTED state forever

yarn jar /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar  -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args 2h -timeout 7200000 -jar /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar -queue tkgrid  -node_label_expression tkgrid  -num_containers 20


17/12/27 09:31:48 INFO distributedshell.Client: Got application report from ASM for, appId=5, clientToAMToken=null, appDiagnostics=[Wed Dec 27 09:31:39 +0000 2017] Application is Activated, waiting for resources to be assigned for AM.  Details : AM Partition = tkgrid ; Partition Resource = <memory:35840, vCores:56> ; Queue's Absolute capacity = 85.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; , appMasterHost=N/A, appQueue=tkgrid, appMasterRpcPort=-1, appStartTime=1514367099792, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://bigdata3.openstacklocal:8088/proxy/application_1514366265793_0005/, appUser=yarn

{code}











--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org