You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Wang, Xinglong (Jira)" <ji...@apache.org> on 2019/11/14 09:17:00 UTC

[jira] [Created] (YARN-9980) App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive partition queue

Wang, Xinglong created YARN-9980:
------------------------------------

             Summary: App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive partition queue
                 Key: YARN-9980
                 URL: https://issues.apache.org/jira/browse/YARN-9980
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Wang, Xinglong
            Assignee: Wang, Xinglong
         Attachments: Screen Shot 2019-11-14 at 5.11.39 PM.png

App hangs in accpeted when moved from DEFAULT_PARTITION queue to an exclusive partition queue.

queue_root
queue_a   ----- default_partition
queue_b   ----- exclusive partition x, default partition is x

When an app is submitted to queue_a, with AM_LABEL_EXPRESSION unset, RM will give default_partition as AM_LABEL_EXPRESSION to this app, then it gets an am1 and runs. And if later, the app is moved to queue_b, and the am1 is preempted/killed/failed, it will schedule another am2 if am retry number allows. But this time the resource request for this am2 is with AM_LABEL_EXPRESSION = default_partition, the issue is queue_b don't have any resource with default_partition, then this app will be in accepted state forever in RM UI.

My understanding is that, since the app was submitted with no AM_LABEL_EXPRESSION, And in the code base, we allow in our code for such kind of app to run with current queue's default partition.
Here for the move queue scenario, we should also let the app to run successfully. That means am2 should get queue_b's default partition x resource to run instead of pending forever.

In our production, we have a landing queue with default_partition, we have some kind of route mechanism to route apps in this queue to other queues including queues with exclusive partition.

 !Screen Shot 2019-11-14 at 5.11.39 PM.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org