You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2015/01/02 22:08:35 UTC

[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished

    [ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263236#comment-14263236 ] 

Jian He commented on YARN-2997:
-------------------------------

[~chengbing.liu], thanks for your explanation !  patch looks good overall, few comments:
- for simplicity, we can use the addAll method for the for loop.
{code}
for (ContainerStatus containerStatus : pendingCompletedContainers.values()) {
      containerStatuses.add(containerStatus);
    }
{code}
- pendingCompletedContainers, maybe use a set instead of a map?
- pendingCompletedContainers.remove(containerId); this line may be not needed, given pendingCompletedContainers.clear() is invoked earlier
- I found pendingContainersToRemove potentially has a leak, we should probably add following in the while loop of removeOrTrackCompletedContainersFromContext, would you mind fixing this too ?
{code}
      if (nmContainer == null) {
        iter.remove();
      }
{code}
- could you add code comments on the modified test cases, so that people can reason more easily ? thx
{code}
        if (heartBeatID == 2) {
            Assert.assertEquals(statuses.size(), 4);
          } else {
            Assert.assertEquals(statuses.size(), 2);
          }
{code}

> NM keeps sending finished containers to RM until app is finished
> ----------------------------------------------------------------
>
>                 Key: YARN-2997
>                 URL: https://issues.apache.org/jira/browse/YARN-2997
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Chengbing Liu
>         Attachments: YARN-2997.2.patch, YARN-2997.patch
>
>
> We have seen in RM log a lot of
> {quote}
> INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
> {quote}
> It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)