You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2017/03/08 22:41:38 UTC
[jira] [Comment Edited] (YARN-5311) Document graceful decommission CLI and usage

    [ https://issues.apache.org/jira/browse/YARN-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902101#comment-15902101 ] 

Junping Du edited comment on YARN-5311 at 3/8/17 10:41 PM:
-----------------------------------------------------------

Sorry for coming late on this as reviewing document is always a not-easy work. 
Thanks [~elek] for the patch, some comments so far:
1. In overview, we should explain some high level use cases - like elasticity for yarn nodes in public cloud infrastructure, etc. Also, we should mention timeout tracking in client and server side and their differences in prospective of IT operations.

2. As far as I remember, we don't support specified timeout value in exclude file for client side timeout tracking initially. It seems YARN-4676 only support that for server side tracking. We should mention that explicitly.

3. Also, for exclude file, we should mention currently we only support plain text (no timeout value) and XML. However, we have plan to support JSON format in future - please refer YARN-5536 for more details.

4. We should mention the behavior for RM get restarted/failed over, the decommissioning node will get decommissioned after RM come back as no timeout value get preserved so far. We should enhance it later - with YARN-5464 get fixed. So far we can just mention the current behavior as a NOTE but we can update later once we have better solution.

Some NITs:

bq. (Note: It isn't needed to restart resourcemanager in case of changing the exclude-path as it's reread at every `refresNodes` command)
We should make it more readable, something like: "It is unnecessary to restart RM in case of changing the exclude-path as this config will be read again for every 'refreshNodes' command"

bq. +* WAIT_CONTAINER --- wait for running containers to complete.
Capitalize "w" for wait as other items.

bq. +* WAIT_APP --- wait for running application to complete (after all containers complete)
Same comments above.


was (Author: djp):
Sorry for coming late on this as reviewing document is always a not-easy work. 
Thanks [~elek] for the patch, some comments so far:
1. In overview, we should explain some high level use cases - like elasticity for yarn nodes in public cloud infrastructure, etc. Also, we should mention timeout tracking in client and server side and their differences in prospective of IT operations.

2. As far as I remember, we don't support specified timeout value in exclude file for client side timeout tracking initially. It seems YARN-4676 only support that for server side tracking. We should mention that explicitly.

3. Also, for exclude file, we should mention currently we only support plain text (no timeout value) and XML. However, we have plan to support JSON format in future - please refer YARN-5536 for more details.

4. We should mention the behavior for RM get restarted/failed over, the decommissioning node will get decommissioned after RM come back as no timeout value get preserved so far. We should enhance it later - with YARN-5464 get fixed. So far we can just mention the current behavior as a NOTE but we can update later once we have better solution.

Some NITs:

bq. (Note: It isn't needed to restart resourcemanager in case of changing the exclude-path as it's reread at every `refresNodes` command)
It is unnecessary to restart RM in case of changing the exclude-path as this config will be read again for every 'refreshNodes' command

bq. +* WAIT_CONTAINER --- wait for running containers to complete.
Capitalize "w" for wait as other items.

bq. +* WAIT_APP --- wait for running application to complete (after all containers complete)
Same comments above.

> Document graceful decommission CLI and usage
> --------------------------------------------
>
>                 Key: YARN-5311
>                 URL: https://issues.apache.org/jira/browse/YARN-5311
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: 2.9.0
>            Reporter: Junping Du
>            Assignee: Elek, Marton
>         Attachments: YARN-5311.001.patch, YARN-5311.002.patch, YARN-5311.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org