You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Juan Rodríguez Hortalá (JIRA)" <ji...@apache.org> on 2017/05/04 17:09:04 UTC

[jira] [Updated] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

     [ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Juan Rodríguez Hortalá updated YARN-6483:
-----------------------------------------
    Attachment: YARN-6483-v1.patch

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6483
>                 URL: https://issues.apache.org/jira/browse/YARN-6483
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Juan Rodríguez Hortalá
>         Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful decommissioning mechanism to give time for tasks to complete in a node that is scheduled for decommission, and for reducer tasks to read the shuffle blocks in that node. Also, YARN effectively blacklists nodes in DECOMMISSIONING state by assigning them a capacity of 0, to prevent additional containers to be launched in those nodes, so no more shuffle blocks are written to the node. This blacklisting is not effective for applications like Spark, because a Spark executor running in a YARN container will keep receiving more tasks after the corresponding node has been blacklisted at the YARN level. We would like to propose a modification of the YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat. This way a Spark application master would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org