You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2016/06/01 08:16:59 UTC
[jira] [Comment Edited] (YARN-5156) YARN_CONTAINER_FINISHED of
YARN_CONTAINERs will always have running state
[ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308259#comment-15308259 ]
Varun Saxena edited comment on YARN-5156 at 6/1/16 8:16 AM:
------------------------------------------------------------
Looked at the code. NMTimelinePublisher publishes the YARN_CONTAINER_FINISHED event on ApplicationContainerFinishedEvent.
And this event is posted from ContainerImpl.
The issue here seems to be that we are cloning the container status and posting a ApplicationContainerFinishedEvent before the transition has completed and container state has been set to DONE. This means the container state is reported as RUNNING. ContainerImpl#sendFinishedEvents which posts a ApplicationContainerFinishedEvent is called from all those transitions which would lead the state to be changed to DONE. So in NMTimelinePublisher#publishContainerFinishedEvent we can simply set STATE_EVENT_INFO as DONE (or pass the desired state in sendFinishedEvents and set it in ApplicationContainerFinishedEvent as the state passed (i.e. DONE currently)) .
Or as we know that container finished event would always lead to a state of DONE, no need to send STATE_EVENT_INFO at all. Thoughts ?
{code:title=ContainerImpl.java|borderStyle=solid}
@SuppressWarnings("unchecked")
private void sendFinishedEvents() {
// Inform the application
@SuppressWarnings("rawtypes")
EventHandler eventHandler = dispatcher.getEventHandler();
ContainerStatus containerStatus = cloneAndGetContainerStatus();
eventHandler.handle(new ApplicationContainerFinishedEvent(containerStatus));
// Remove the container from the resource-monitor
eventHandler.handle(new ContainerStopMonitoringEvent(containerId));
// Tell the logService too
eventHandler.handle(new LogHandlerContainerFinishedEvent(
containerId, exitCode));
}
{code}
Naga, you will be handling this ?
was (Author: varun_saxena):
Looked at the code. NMTimelinePublisher publishes the YARN_CONTAINER_FINISHED event on ApplicationContainerFinishedEvent.
And this event is posted from ContainerImpl.
The issue here seems to be that we are cloning the container status and posting a ApplicationContainerFinishedEvent before the transition has completed and container state has been set to DONE. This means the container state is reported as RUNNING. ContainerImpl#sendFinishedEvents which posts a ApplicationContainerFinishedEvent is called from all those transitions which would lead the state to be changed to DONE. So in NMTimelinePublisher#publishContainerFinishedEvent we can simply set STATE_EVENT_INFO as DONE.
Or as we know that container finished event would always lead to a state of DONE, no need to send STATE_EVENT_INFO at all. Thoughts ?
{code:title=ContainerImpl.java|borderStyle=solid}
@SuppressWarnings("unchecked")
private void sendFinishedEvents() {
// Inform the application
@SuppressWarnings("rawtypes")
EventHandler eventHandler = dispatcher.getEventHandler();
ContainerStatus containerStatus = cloneAndGetContainerStatus();
eventHandler.handle(new ApplicationContainerFinishedEvent(containerStatus));
// Remove the container from the resource-monitor
eventHandler.handle(new ContainerStopMonitoringEvent(containerId));
// Tell the logService too
eventHandler.handle(new LogHandlerContainerFinishedEvent(
containerId, exitCode));
}
{code}
Naga, you will be handling this ?
> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -------------------------------------------------------------------------
>
> Key: YARN-5156
> URL: https://issues.apache.org/jira/browse/YARN-5156
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Li Lu
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do we design this deliberately or it's a bug?
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_000018",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_000018",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_000001"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org