You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Haibo Chen (JIRA)" <ji...@apache.org> on 2017/10/05 23:20:00 UTC
[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

    [ https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193897#comment-16193897 ] 

Haibo Chen commented on YARN-4511:
----------------------------------

Thanks for the reviews, [~leftnoteasy]!

The scope of the variable allocationInThisHeartbeat is supposed to be a round of allocation on a given node. In the case of fair scheduler, it is every nodeUpdate() method call. Can you point out where else do we also do allocation that is decoupled from heartbeat? Here is what this variable is used for in the case of fair scheduler: every time there is a node update, we'd do allocation of guaranteed containers first followed by opportunistic containers. W need to consider the just-allocated-yet-to-launch guaranteed containers to project how much resource we have left to allocate opportunistic containers. Any guidance on how to handle allocation decouple from heartbeat is appreciated.  

bq. In what case the oldNode and newNode will be null and should we throw exception when this happens?
This is more to avoid NPE defensively. I guess you are right, we should probably throw exception since it indicates errors and will lead to problems if we update one but not the other.

bq. This part of logic looks confusing, since old containers will be finished inside pullNewlyUpdatedContainers, do we really need this method? 
This arises from the fact that RMContainer events trigger update of schedulerNode, whereas the current implementation of container incr/decr/promotion/demotion only swaps the underly Container instance for RMContainers, which does not trigger schedulerNode updates, so the resulting accounting on that node is incorrect. [~asuresh] any suggestion to simply this?

bq. Why containerLaunched is added? Should we just increase allocated opportunistic/guaranteed resource?
I only try to preserve the containerLaunched flag. Can you be more specific about what you're referring to in the patch?

bq. I think one of the ResourceThresholds and OverAllocationInfo should be removed, they're kind of duplicated. We should try to reduce unnecessary #PB-records.
Good point. The initial intension was to include more than resourceThresholds in OverAllocationInfo, such as per node max allocation in terms of a percentage. We may be able to get rid of OverAllocationInfo down the road if there is no such need any more.

There is a jira open to consolidate with Resource Profiles (YARN-6690). Is that a good place to do the work to accommodate other resources?


> Common scheduler changes supporting scheduler-specific implementations
> ----------------------------------------------------------------------
>
>                 Key: YARN-4511
>                 URL: https://issues.apache.org/jira/browse/YARN-4511
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Haibo Chen
>         Attachments: YARN-4511-YARN-1011.00.patch, YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org