You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Charan Hebri (JIRA)" <ji...@apache.org> on 2018/04/09 10:11:00 UTC

[jira] [Created] (YARN-8130) Race condition when container events are published for KILLED applications

Charan Hebri created YARN-8130:
----------------------------------

             Summary: Race condition when container events are published for KILLED applications
                 Key: YARN-8130
                 URL: https://issues.apache.org/jira/browse/YARN-8130
             Project: Hadoop YARN
          Issue Type: Bug
          Components: ATSv2
            Reporter: Charan Hebri


There seems to be a race condition happening when an application is KILLED and the corresponding container event information is being published. For completed containers, a YARN_CONTAINER_FINISHED event is generated but for some containers in a KILLED application this information is missing. Below is a node manager log snippet,
{code:java}
2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application application_1523259757659_0003 removed, cleanupLocalDirs = false

2018-04-09 08:44:54,478 INFO  application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application application_1523259757659_0003 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED

2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been removed before the entity could be published for TimelineEntity[type='YARN_CONTAINER', id='container_1523259757659_0003_01_000002']

2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just finished : application_1523259757659_0003

2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs for container container_1523259757659_0003_01_000001. Current good log dirs are /grid/0/hadoop/yarn/log

2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs for container container_1523259757659_0003_01_000002. Current good log dirs are /grid/0/hadoop/yarn/log

2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager (TimelineCollectorManager.java:remove(192)) - The collector service for application_1523259757659_0003 was removed

2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl (ContainerManagerImpl.java:handle(1572)) - couldn't find application application_1523259757659_0003 while processing FINISH_APPS event. The ResourceManager allocated resources for this application to the NodeManager but no active containers were found to process{code}
The container id specified in the log, *container_1523259757659_0003_01_000002* is the one that has the finished event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org