You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Arun Suresh (JIRA)" <ji...@apache.org> on 2016/10/12 13:56:21 UTC
[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

    [ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568792#comment-15568792 ] 

Arun Suresh edited comment on YARN-4597 at 10/12/16 1:56 PM:
-------------------------------------------------------------

Thanks for taking a look [~jianhe],

bq. Wondering why KillWhileExitingTransition is added..
I had put it in there for debugging something... Left it there since I thought it was harmless... but, yeah looks like it does over-ride the exitcode. Will remove it. Good catch.

* w.r.t {{ContainerState#SCHEDULED}} : Actually, I think we should expose this. We currently club NEW, LOCALIZING, LOCALIZED etc. into RUNNING, but the container is actually not running, and is thus misleading. SCHEDULED implies that some of the containers dependencies (resources for localization + some internal queuing/scheduling policy) have not yet been met.
Prior to this, YARN-2877 had introduced the QUEUED return state. This would be visible to applications, if Queuing was enabled. This patch technically just renames QUEUED to SCHEDULED. Also, all containers will go thru the SCHEDULED state, not just the opportunistic ones (although, for guaranteed containers this will just be a pass-thru state)

Another thing I was hoping for some input was, currently, the {{ContainerScheduler}} runs in the same thread as the ContainerManager's AsyncDispatcher started by the ContainerManager. Also, the Scheduler is triggered only by events. I was wondering if there is any merit pushing these events into a blocking queue as they arrive and have a separate thread take care of them. This will preserve the serial nature of operation (and thereby keep the code simple by not needing synchronized collections) and will not hold up the dispatcher from delivering other events while the scheduler is scheduling.
A minor disadvantage, is that the NM will probably consume a thread that for the most part will be blocked on the queue. This thread could be used by one of the containers.


was (Author: asuresh):
Thanks for taking a look [~jianhe],

bq. Wondering why KillWhileExitingTransition is added..
I had put it in there for debugging something... Left it there since it thought its harmless... but, yeah looks like it does over-ride the exitcode. Will remove it. Good catch.

* w.r.t {{ContainerState#SCHEDULED}} : Actually, I think we should expose this. We currently club NEW, LOCALIZING, LOCALIZED etc. into RUNNING, but the container is actually not running, and is thus misleading. SCHEDULED implies that some of the containers dependencies (resources for localization + some internal queuing/scheduling policy) have not yet been met.
Prior to this, YARN-2877 had introduced the QUEUED return state. This would be visible to applications, if Queuing was enabled. This patch technically just renames QUEUED to SCHEDULED. Also, all containers will go thru the SCHEDULED state, not just the opportunistic ones (although, for guaranteed containers this will just be a pass-thru state)

Another thing I was hoping for some input was, currently, the {{ContainerScheduler}} runs in the same thread as the ContainerManager's AsyncDispatcher started by the ContainerManager. Also, the Scheduler is triggered only by events. I was wondering if there is any merit pushing these events into a blocking queue as they arrive and have a separate thread take care of them. This will preserve the serial nature of operation (and thereby keep the code simple by not needing synchronized collections) and will not hold up the dispatcher from delivering other events while the scheduler is scheduling.
A minor disadvantage, is that the NM will probably consume a thread that for the most part will be blocked on the queue. This thread could be used by one of the containers.

> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch
>
>
> Currently, the NM immediately launches containers after resource localization. Several features could be more cleanly implemented if the NM included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org