You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2016/06/01 08:16:59 UTC

[jira] [Comment Edited] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state

    [ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308259#comment-15308259 ] 

Varun Saxena edited comment on YARN-5156 at 6/1/16 8:16 AM:
------------------------------------------------------------

Looked at the code. NMTimelinePublisher publishes the YARN_CONTAINER_FINISHED event on ApplicationContainerFinishedEvent.
And this event is posted from ContainerImpl.

The issue here seems to be that we are cloning the container status and posting a ApplicationContainerFinishedEvent before the transition has completed and container state has been set to DONE. This means the container state is reported as RUNNING. ContainerImpl#sendFinishedEvents which posts a ApplicationContainerFinishedEvent is called from all those transitions which would lead the state to be changed to DONE. So in NMTimelinePublisher#publishContainerFinishedEvent we can simply set STATE_EVENT_INFO as DONE (or pass the desired state in sendFinishedEvents and set it in ApplicationContainerFinishedEvent as the state passed (i.e. DONE currently)) .
Or as we know that container finished event would always lead to a state of DONE, no need to send STATE_EVENT_INFO at all. Thoughts ?

{code:title=ContainerImpl.java|borderStyle=solid}
  @SuppressWarnings("unchecked")
  private void sendFinishedEvents() {
    // Inform the application
    @SuppressWarnings("rawtypes")
    EventHandler eventHandler = dispatcher.getEventHandler();

    ContainerStatus containerStatus = cloneAndGetContainerStatus();
    eventHandler.handle(new ApplicationContainerFinishedEvent(containerStatus));

    // Remove the container from the resource-monitor
    eventHandler.handle(new ContainerStopMonitoringEvent(containerId));
    // Tell the logService too
    eventHandler.handle(new LogHandlerContainerFinishedEvent(
      containerId, exitCode));
  }
{code}

Naga, you will be handling this ?


was (Author: varun_saxena):
Looked at the code. NMTimelinePublisher publishes the YARN_CONTAINER_FINISHED event on ApplicationContainerFinishedEvent.
And this event is posted from ContainerImpl.

The issue here seems to be that we are cloning the container status and posting a ApplicationContainerFinishedEvent before the transition has completed and container state has been set to DONE. This means the container state is reported as RUNNING. ContainerImpl#sendFinishedEvents which posts a ApplicationContainerFinishedEvent is called from all those transitions which would lead the state to be changed to DONE. So in NMTimelinePublisher#publishContainerFinishedEvent we can simply set STATE_EVENT_INFO as DONE.
Or as we know that container finished event would always lead to a state of DONE, no need to send STATE_EVENT_INFO at all. Thoughts ?

{code:title=ContainerImpl.java|borderStyle=solid}
  @SuppressWarnings("unchecked")
  private void sendFinishedEvents() {
    // Inform the application
    @SuppressWarnings("rawtypes")
    EventHandler eventHandler = dispatcher.getEventHandler();

    ContainerStatus containerStatus = cloneAndGetContainerStatus();
    eventHandler.handle(new ApplicationContainerFinishedEvent(containerStatus));

    // Remove the container from the resource-monitor
    eventHandler.handle(new ContainerStopMonitoringEvent(containerId));
    // Tell the logService too
    eventHandler.handle(new LogHandlerContainerFinishedEvent(
      containerId, exitCode));
  }
{code}

Naga, you will be handling this ?

> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -------------------------------------------------------------------------
>
>                 Key: YARN-5156
>                 URL: https://issues.apache.org/jira/browse/YARN-5156
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do we design this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_000018",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_000018",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_000001"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org