You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "genericqa (JIRA)" <ji...@apache.org> on 2018/06/30 00:54:00 UTC

[jira] [Commented] (YARN-7899) [AMRMProxy] Stateful FederationInterceptor for pending requests

    [ https://issues.apache.org/jira/browse/YARN-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528450#comment-16528450 ] 

genericqa commented on YARN-7899:
---------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 12s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 13s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 57s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}110m 45s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7899 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929784/YARN-7899.v3.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 01ef86f5091c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cdb0844 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21154/testReport/ |
| Max. process+thread count | 291 (vs. ulimit of 10000) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21154/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [AMRMProxy] Stateful FederationInterceptor for pending requests
> ---------------------------------------------------------------
>
>                 Key: YARN-7899
>                 URL: https://issues.apache.org/jira/browse/YARN-7899
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Major
>         Attachments: YARN-7899.v1.patch, YARN-7899.v3.patch, YARN-7899.v3.patch
>
>
> Today FederationInterceptor (in AMRMProxy for YARN Federation) is stateless in terms of pending (outstanding) requests. Whenever AM issues new requests, FI simply splits and sends them to sub-cluster YarnRMs and forget about them. This JIRA attempts to make FI stateful so that it remembers the pending requests in all relevant sub-clusters. This has two major benefits: 
> 1. It is a prerequisite for FI to be able to cancel pending request in one sub-cluster and re-send it to other sub-clusters. This is needed for load balancing and to fully comply with the relax locality fallback to ANY semantic. When we send a request to one sub-cluster, we have effectively restrained the allocation for this request to be within this sub-cluster rather than everywhere. If the cluster capacity in this sub-cluster for this app is full or this YarnRM is overloaded and slow, the request will be stuck there for a long time even if there is free capacity in other sub-clusters. We need FI to remember and adjust the pending requests on the fly. 
> 2. This makes pending request recovery easier when YarnRM fails over. Today whenever one sub-cluster RM fails over, in order to recover lost pending requests for this sub-cluster, 
> we have to propagate the ApplicationMasterNotRegisteredException from the YarnRM back to AM, triggering a full pending resend from AM. This contains pending for not only the failing-over sub-cluster, but everyone. Since our split-merge (AMRMProxyPolicy) does not guarantee idempotency, the same request we sent to sub-cluster-1 earlier might be resent to sub-cluster-2. If both these YarnRMs have not failed over, they will both allocate for this request, leading to over-allocation. Also, these full pending resends also puts unnecessary load on every YarnRM in the cluster everytime one YarnRM fails over. With stateful FederationInterceptor, since we remember pending requests we have sent out earlier, we can shield the ApplicationMasterNotRegisteredException for AM and resend the pending only to the failed over YarnRM. This eliminates over-allocation and minimizes the recovery overhead. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org