You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Rohith Sharma K S (JIRA)" <ji...@apache.org> on 2016/10/29 11:29:58 UTC

[jira] [Updated] (YARN-4862) Handle duplicate completed containers in RMNodeImpl

     [ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohith Sharma K S updated YARN-4862:
------------------------------------
    Attachment: YARN-4862-004.patch

Updating patch handling completed container leak. The scenario is when ever RM do not track containers, in RMNodeImpl conatainerId get added to completedContainer list. Since this container is not tracked by RM, RM just ignore it. This causes leak in completedContainer. 

I have updated patch fixing the leak by triggering an event to RMNodeImpl. This is basically same issue as YARN-5279. But I would prefer to add in this JIRA itself rather than committing separately.

As part of latest patch attached, I have combined patch of YARN-5279 too. With respect addressing comments of YARN-5279, I have not created different event class and name as per comment. I have reused same event type FINISHED_CONTAINERS_PULLED_BY_AM and its class RMNodeFinishedContainersPulledByAMEvent. It is because, both event are same to RMNodeImpl. May be I can change existing event type  FINISHED_CONTAINERS_PULLED_BY_AM to CONTAINERS_TO_BE_REMOVED_FROM_NM. Thoughts?

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch, 0003-YARN-4862.patch, YARN-4862-004.patch
>
>
> As per [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689] from [~sharadag], there should be safe guard for duplicated container status in RMNodeImpl before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, if any duplicated container are sent to RM(may be bug in NM also), there is significant impact that RMNodImpl always create UpdatedContainerInfo for duplicated containers. This result in increase in the heap memory and causes problem like YARN-4852.
> This is an optimization for issue kind YARN-4852



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org