You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Eric Payne (JIRA)" <ji...@apache.org> on 2018/07/09 20:08:00 UTC

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

    [ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537497#comment-16537497 ] 

Eric Payne commented on YARN-4606:
----------------------------------

{quote}
So, after move app operation and if there is no events (which can trigger user limit computation) for brief amount of time, am seeing incorrect numActiveUsersWithOnlyPendingApps count. Is this acceptable? or Should we trigger user limit computation after move operation like how we are doing it in other places?
{quote}
[~manirajv06@gmail.com], moving the apps to the other queue does trigger a recalculation of user limits since it is adding a new app to the queue and potentially a new user. Also, since it fixes itself after the nodemanager heartbeat of a few seconds (default is 1), I think that is fine.

I noticed that there is another bug related to moving apps to a different queue that could affect the above use case. See YARN-8421

> CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.002.patch, YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending (caused by max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This could lead to starvation of active applications, for example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org