You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Yufei Gu (JIRA)" <ji...@apache.org> on 2017/05/08 21:38:04 UTC

[jira] [Commented] (YARN-6568) A queue which runs a long time job couldn't acquire any container for long time.

    [ https://issues.apache.org/jira/browse/YARN-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001592#comment-16001592 ] 

Yufei Gu commented on YARN-6568:
--------------------------------

Not sure if I understand your issue. 
{quote}
I simulate in test cluster. I submit DistributedShell application which run many loo applications to queueA, then I submit my own yarn application which request container and release container constantly to queueB. At this time, any applicaitons which are submmited to queueA keep pending!
{quote}
That sounds legitimate to me. QueueA gets all resources if no other queues are active. Submitting apps to queueB change queueB from inactive to active which makes queueA only get its own portion instead of all resources in the cluster. To increase weight of queueA can mitigate this issue. 

> A queue which runs a long time job couldn't acquire any container for long time.
> --------------------------------------------------------------------------------
>
>                 Key: YARN-6568
>                 URL: https://issues.apache.org/jira/browse/YARN-6568
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>         Environment: CentOS 7.1
>            Reporter: zhengchenyu
>             Fix For: 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, we find some applications couldn't acquire any container for long time. (Note: we use FairSharePolicy and FairScheduler)
> First, I found some unreasonable configuration, we set minRes=maxRes. So some application keep pending for long time, we kill some large applicaiton to solve this problem. Then we changed this configuration, this problem relieves. 
> But this problem is not completely solved. In our cluster, I found applications in  some queue which request few container keep pending for long time. 
> I simulate in test cluster. I submit DistributedShell application which run many loo applications to queueA, then I submit my own yarn application which request container and release container constantly to queueB.  At this time, any applicaitons which are submmited to queueA keep pending!
> We know this is the problem of FairSharePolicy, it consider the request of queue. So after sort the queues, some queues which have few request are ordered last all time.
> We know if the AM container is launched, then the request will increase, But FairSharePolicy can't distinguish which request is AM request. I think if am container is assigned, the problem is solved. 
> Our companion discuss this problem. we recommend set a timeout for queue, it means the time length of a queue is not assigned. If timeout, we set this queue to the first place of queues list. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org