You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wilfred Spiegelenburg (JIRA)" <ji...@apache.org> on 2016/03/09 01:39:40 UTC

[jira] [Assigned] (YARN-4698) Negative value in RM UI counters due to double container release

     [ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wilfred Spiegelenburg reassigned YARN-4698:
-------------------------------------------

    Assignee: Wilfred Spiegelenburg

> Negative value in RM UI counters due to double container release
> ----------------------------------------------------------------
>
>                 Key: YARN-4698
>                 URL: https://issues.apache.org/jira/browse/YARN-4698
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>    Affects Versions: 2.5.1
>            Reporter: Dmytro Kabakchei
>            Assignee: Wilfred Spiegelenburg
>            Priority: Minor
>         Attachments: Example.log-cut, mitigating2.5.1.diff
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> - Containers Running: -19
> - Memory Used: -38GB
> - Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill.
> Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff).
> Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)