You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Kuhu Shukla (JIRA)" <ji...@apache.org> on 2015/11/03 18:22:27 UTC

[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

     [ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kuhu Shukla updated YARN-4311:
------------------------------
    Attachment: YARN-4311-v1.patch

This is a preliminary proposal patch. Through this change, any node is checked if it should be removed from all lists by the isInvalidAndAbsent() method. There are however different outcomes based on the initial state of the node.

If the node state is running (,is part of the include list) and is taken out of the list, followed by -refreshNodes (and it is not of course in the exclude list), the node is shutdown and shutdown node count is incremented . The node is not considered a decommed node.

If the node is in both exclude and include list and -refreshNodes has been done, (meaning it is a decommissioned node,) then  removing it from both those lists takes it out completely not showing up in shutdown or decommed or unhealthy or active lists. There is one case where shutdown counters are misleading and which this patch hasnt addressed. If the node was running and was taken out of include list, and it comes back up after being added to the include list, the shutdown counters still stick to the same value. This needs to be changed since current state transitions dont account for it.

Need some inputs from the community on the semantics of such a fix. It may be a good idea to have them counted (but not listed) as shutdown nodes in the first case since any mistake in configuring the include list will lose all information about the nodes (counters and nodelists) which may be undesirable.

Appreciate any comments/suggestions/caveats. I have not fixed associated test failures through this patch.

> Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4311
>                 URL: https://issues.apache.org/jira/browse/YARN-4311
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-4311-v1.patch
>
>
> In order to fully forget about a node, removing the node from include and exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The tricky part that [~jlowe] pointed out was the case when include lists are not used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)