You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Rohith Sharma K S (JIRA)" <ji...@apache.org> on 2018/05/10 17:04:00 UTC

[jira] [Comment Edited] (YARN-8130) Race condition when container events are published for KILLED applications

    [ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470741#comment-16470741 ] 

Rohith Sharma K S edited comment on YARN-8130 at 5/10/18 5:03 PM:
------------------------------------------------------------------

Events are dispatched in FIFO but NMTimelinePublisher has internal dispatcher for processing timeline events. This internal dispatcher also follow FIFO order which could be delayed if queue has outstanding events still.


was (Author: rohithsharma):
Events are dispatched in FIFO but NMTimelinePublisher has internal dispatcher for processing timeline events. This internal dispatcher might also follow FIFO order which could be delayed. 

> Race condition when container events are published for KILLED applications
> --------------------------------------------------------------------------
>
>                 Key: YARN-8130
>                 URL: https://issues.apache.org/jira/browse/YARN-8130
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: ATSv2
>            Reporter: Charan Hebri
>            Assignee: Rohith Sharma K S
>            Priority: Major
>         Attachments: YARN-8130.01.patch, YARN-8130.02.patch
>
>
> There seems to be a race condition happening when an application is KILLED and the corresponding container event information is being published. For completed containers, a YARN_CONTAINER_FINISHED event is generated but for some containers in a KILLED application this information is missing. Below is a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application application_1523259757659_0003 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been removed before the entity could be published for TimelineEntity[type='YARN_CONTAINER', id='container_1523259757659_0003_01_000002']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs for container container_1523259757659_0003_01_000001. Current good log dirs are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs for container container_1523259757659_0003_01_000002. Current good log dirs are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager (TimelineCollectorManager.java:remove(192)) - The collector service for application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1572)) - couldn't find application application_1523259757659_0003 while processing FINISH_APPS event. The ResourceManager allocated resources for this application to the NodeManager but no active containers were found to process{code}
> The container id specified in the log, *container_1523259757659_0003_01_000002* is the one that has the finished event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org