You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Arun Suresh (JIRA)" <ji...@apache.org> on 2017/07/17 21:36:01 UTC
[jira] [Commented] (YARN-6831) Miscellaneous refactoring changes of ContainScheduler

    [ https://issues.apache.org/jira/browse/YARN-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090609#comment-16090609 ] 

Arun Suresh commented on YARN-6831:
-----------------------------------

Thanks for raising this [~haibochen] / [~kasha]. Some thoughts:

bq. Why do we need maxOppQueueLength given queuingLimit?
So, maxOppQueueLength is more like an *active* limit. The CS (ContainerScheduler) will not admit any more containers than that value. While the queuingLimit is more *reactive* and dynamically calculated by the RM and passed down to the NM in a HB response. The RM constantly calculates the mean/median of the queueLengths on all nodes and it tells the NM to shed containers from the queue if it is too high. I agree that the *maxOppQueueLength* can probably be removed though. But given your observation in YARN-6706 that test cases depends on this, my opinion is that we will keep it, and put a very high value by default - and mark it as VisibileForTesting only.

bq. Is there value in splitting runningContainers into runningGuaranteed and runningOpportunistic ?
Hmm… I was actually thinking of removing the *runningContainers* itself. It was introduced to keep track of all running containers (containers whose state is running) AND those that have been scheduled but not yet running. I think it may be better to encapsulate that as a proper container state, something like *SCHEDULED_TO_RUN* via a proper transition.
Adding more data structures might be problematic later on, since we can hit minor race conditions when transferring containers from runningGuaranteed to running Opportunistic (during promotion) and vice-versa (during demotion) if we are not careful about synchronization etc. Also, given the fact that a NM will not run more than say a couple of 100 containers, it might be better to just iterate over all the containers when the scheduler needs to make a decision.
Another problem with keeping a separate map is during NM recovery, we have to populate this specifically. we don’t do that for running containers now either – but I was think if we removed the *runningContainers* map, we wont have to (we already have a state called *QUEUED* in the NMStateStore which can be used to set the correct state in the recovered container)

bq. getOpportunisticContainersStatus method implementation feels awkward..
Kind of agree with you there, don’t recall exactly why we did it like that… think it was to not have to create a new instance of the status at every heart beat. 

bq. Have we considered folding ContainerQueuingLimit class into this
My first instinct is to keep it separate. Don’t think we should mix the Queuing aspect of the Container Scheduler with the ExecutionType aspect. Also, one is part of the NM heartbeat request and the other comes back as response.


> Miscellaneous refactoring changes of ContainScheduler 
> ------------------------------------------------------
>
>                 Key: YARN-6831
>                 URL: https://issues.apache.org/jira/browse/YARN-6831
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> While reviewing YARN-6706, Karthik pointed out a few issues for improvment in ContainerScheduler
> *Make ResourceUtilizationTracker pluggable. That way, we could use a different tracker when oversubscription is enabled.
> *ContainerScheduler
>   ##Why do we need maxOppQueueLength given queuingLimit?
>   ##Is there value in splitting runningContainers into runningGuaranteed and runningOpportunistic?
>   ##getOpportunisticContainersStatus method implementation feels awkward. How about capturing the state in the field here, and have metrics etc. pull from here?
>   ##startContainersFromQueue: Local variable resourcesAvailable is unnecessary
> *OpportunisticContainersStatus
>   ##Let us clearly differentiate between allocated, used and utilized. Maybe, we should rename current Used methods to Allocated?
>   ##I prefer either full name Opportunistic (in method) or Opp (shortest name that makes sense). Opport is neither short nor fully descriptive.
>   ##Have we considered folding ContainerQueuingLimit class into this?
> We decided to move the issues into this follow up jira to keep YARN-6706 moving forward to unblock oversubscription work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org