You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Daniel Zhi (JIRA)" <ji...@apache.org> on 2016/04/14 01:24:25 UTC

[jira] [Updated] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

     [ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Zhi updated YARN-4676:
-----------------------------
    Attachment: YARN-4676.010.patch

Patch 010 contains updates for review comments 2~7. The main change is for No 5 --- merge refreshNodes(long timeout) with YARN-4676:
  1. Provide timeout through RefreshNodeRequest to NodesListManager;
  2. NodesListManager uses the timeout if no per-node timeout specified;
  3. The final FORCEFUL decommission is still there but with an extra 20 seconds delay after timeout. I have verified that RM tracks and handles the timeout as expected so normally RMAdminCLI won't forceful decommission. I also verified that even if it does it, DecommissioningNodesWatcher is fine dealing with it. So overall, we can simply preserve the FORCEFUL decommission near the end of refreshNodes(long timeout).

For 8. I need help on how to add "docs".
For 1. I will consider a pseudo patch with one line comment to see whether QA still complain unit test errors.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks DECOMMISSIONING nodes status automatically and asynchronously after client/admin made the graceful decommission request. It tracks DECOMMISSIONING nodes status to decide when, after all running containers on the node have completed, will be transitioned into DECOMMISSIONED state. NodesListManager detect and handle include and exclude list changes to kick out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)