You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jonathan Hung (JIRA)" <ji...@apache.org> on 2017/07/14 00:58:00 UTC

[jira] [Updated] (YARN-6818) User limit per partition is not honored in branch-2.7 >=

     [ https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hung updated YARN-6818:
--------------------------------
    Attachment: YARN-6818-branch-2.7.001.patch

> User limit per partition is not honored in branch-2.7 >=
> --------------------------------------------------------
>
>                 Key: YARN-6818
>                 URL: https://issues.apache.org/jira/browse/YARN-6818
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>         Attachments: YARN-6818-branch-2.7.001.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of resources a user can consume in a queue in a partition. Suppose you have a queue with access to partition X, used resources in default partition is 0, and used resources in partition X is at the partition's user limit. This is the problematic code as far as I can tell: (in LeafQueue.java){noformat}    if (Resources
>         .greaterThan(resourceCalculator, clusterResource,
>             user.getUsed(label),
>             limit)) {
>       // if enabled, check to see if could we potentially use this node instead
>       // of a reserved node if the application has reserved containers
>       if (this.reservationsContinueLooking) {
>         if (Resources.lessThanOrEqual(
>             resourceCalculator,
>             clusterResource,
>             Resources.subtract(user.getUsed(), application.getCurrentReservation()),
>             limit)) {
>           if (LOG.isDebugEnabled()) {
>             LOG.debug("User " + userName + " in queue " + getQueueName()
>                 + " will exceed limit based on reservations - " + " consumed: "
>                 + user.getUsed() + " reserved: "
>                 + application.getCurrentReservation() + " limit: " + limit);
>           }
>           Resource amountNeededToUnreserve = Resources.subtract(user.getUsed(label), limit);
>           // we can only acquire a new container if we unreserve first since we ignored the
>           // user limit. Choose the max of user limit or what was previously set by max
>           // capacity.
>           currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>               clusterResource, currentResoureLimits.getAmountNeededUnreserve(),
>               amountNeededToUnreserve));
>           return true;
>         }
>       }
>       if (LOG.isDebugEnabled()) {
>         LOG.debug("User " + userName + " in queue " + getQueueName()
>             + " will exceed limit - " + " consumed: "
>             + user.getUsed() + " limit: " + limit);
>       }
>       return false;
>     }
> {noformat}
> First it sees the used resources in partition X is greater than partition's user limit. Then the reservation check also succeeds because it is checking {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns true.
> One fix is to just set {{Resources.subtract(user.getUsed(), application.getCurrentReservation())}} to {{Resources.subtract(user.getUsed(label), application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 introduces this check: {noformat}      if (this.reservationsContinueLooking && checkReservations
>           && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org