You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Botong Huang (JIRA)" <ji...@apache.org> on 2017/05/10 18:33:04 UTC

[jira] [Comment Edited] (YARN-6484) [Documentation] Documenting the YARN Federation feature

    [ https://issues.apache.org/jira/browse/YARN-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005146#comment-16005146 ] 

Botong Huang edited comment on YARN-6484 at 5/10/17 6:32 PM:
-------------------------------------------------------------

Thanks [~curino] for the patch. The description looks great to me. 

Here's some minor things I found, please kindly fix: 
* number of active applications -> number of active applications, *number of active containers* ?
* (10-100k) nodes -> (10-100k nodes)
* One more enter before "This design is structurally"
* Federation is being design as -> (one more enter) Federation is *designed* as
* on any nodes cluster -> on any nodes in the large cluster
* sub-clusters (this -> sub-cluster (this 
* Furthermore separate ... : this sentence is repeated in the *Note* afterwards, consider delete?
* sub-cluster a job -> sub-clusters a job
* At any one time, a job ...: consider move this paragraph into AMRMProxy? 
* all cluster operations, : remove entra enter afterwards
* policies run -> policies that run
* also modified -> also modified by the NM when launching the AM
* both clusters -> relevant sub-clusters
* allocations is done -> allocations are done
* Whether failover across subclsuters -> Whether should retry considering RM failover within each subcluster
* yarn.resourcemanager.cluster-id: this is also required in all NMs
* If this optional address is: format (extra spaces) after this
* Recent analysis of failure modes suggest that we should also maintain an explicit mapping between the notion of an “external App id” and the “internal App id”. This would allow us to hide some class of local failures (e.g., one RM is not reachable and we need to resubmit with a new app id) ----------- We didn't implement this part, for this case, we will likely use a new attempt number with the same app id. 


was (Author: botong):
Thanks [~curino] for the patch. The description looks great to me. 

Here's some minor things I found, please kindly fix: 
* number of active applications -> number of active applications, *number of active containers* ?
* (10-100k) nodes -> (10-100k nodes)
* One more enter before "This design is structurally"
* Federation is being design as -> (one more enter) Federation is *designed* as
* on any nodes cluster -> on any nodes in the large cluster
* sub-clusters (this -> sub-cluster (this 
* Furthermore separate ... : this sentence is repeated in the *Note* afterwards, consider delete?
* sub-cluster a job -> sub-clusters a job
* At any one time, a job ...: consider move this paragraph into AMRMProxy? 
* all cluster operations, : remove entra enter afterwards
* policies run -> policies that run
* also modified -> also modified by the NM when launching the AM
* both clusters -> relevant sub-clusters
* allocations is done -> allocations are done
* Whether failover across subclsuters -> Whether failover within each subclsuter
* yarn.resourcemanager.cluster-id: this is also required in all NMs
* If this optional address is: format (extra spaces) after this
* Recent analysis of failure modes suggest that we should also maintain an explicit mapping between the notion of an “external App id” and the “internal App id”. This would allow us to hide some class of local failures (e.g., one RM is not reachable and we need to resubmit with a new app id) ----------- We didn't implement this part, for this case, we will likely use a new attempt number with the same app id. 

> [Documentation] Documenting the YARN Federation feature
> -------------------------------------------------------
>
>                 Key: YARN-6484
>                 URL: https://issues.apache.org/jira/browse/YARN-6484
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>    Affects Versions: YARN-2915
>            Reporter: Subru Krishnan
>            Assignee: Carlo Curino
>         Attachments: YARN-6484-YARN-2915.v0.patch, YARN-6484-YARN-2915.v1.patch
>
>
> We should document the high level design and configuration to enable YARN Federation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org