You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "slfan1989 (via GitHub)" <gi...@apache.org> on 2023/05/18 15:15:41 UTC

[GitHub] [hadoop] slfan1989 opened a new pull request, #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

slfan1989 opened a new pull request, #5672:
URL: https://github.com/apache/hadoop/pull/5672

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   JIRA: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.
   
   In Federation, multiple attempts of an application share the same UAM in each secondary sub-cluster. When first attempt fails, we reply on the fact that secondary RM won't kill the existing UAM before the AM heartbeat timeout (default at 10 min). When second attempt comes up in the home sub-cluster, it will pick up the UAM token from Yarn Registry and resume the UAM heartbeat to secondary RMs.
   
   The default heartbeat timeout for NM and AM are both 10 mins. The problem is that when the first attempt node goes down or out of connection, only after 10 mins will the home RM mark the first attempt as failed, and then schedule the 2nd attempt in some other node. By then the UAMs in secondaries are already timing out, and they might not survive until the second attempt comes up.
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on a diff in pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on code in PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#discussion_r1199137369


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java:
##########
@@ -703,6 +703,24 @@ protected static void validateConfigs(Configuration conf) {
           + ", " + YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS + "="
           + heartbeatIntvl);
     }
+
+    if (HAUtil.isFederationEnabled(conf)) {
+      /*
+       * In Yarn Federation, we need UAMs in secondary sub-clusters to stay
+       * alive when the next attempt AM in home sub-cluster gets launched. If
+       * the previous AM died because the node is lost after NM timeout. It will
+       * already be too late if AM timeout is even shorter.
+       */
+      int amExpireIntvl = conf.getInt(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS,

Review Comment:
   @goiri  We can first check if the user's parameter consists entirely of numbers. If it does, we will use getInt for parsing. If it does not consist entirely of digits, we will use getTimeDuration for parsing. Can we rewrite the code like this?
   
   ```
   String rmAmExpiryIntervalMS = conf.get(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS);
   long amExpireIntvl;
   if (NumberUtils.isDigits(rmAmExpiryIntervalMS)) {
     amExpireIntvl = conf.getLong(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS,
         YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS);
   } else {
     amExpireIntvl = conf.getTimeDuration(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS,
         YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS, TimeUnit.MILLISECONDS);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1563443554

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  24m 50s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 32s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   6m 51s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 23s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   3m  4s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   7m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m  9s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 55s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   6m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 39s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   6m 39s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m  1s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 51s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   7m 41s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 26s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  4s |  |  hadoop-yarn-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   5m 21s |  |  hadoop-yarn-common in the patch passed.  |
   | +1 :green_heart: |  unit  | 101m 32s |  |  hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  unit  |  23m 36s |  |  hadoop-yarn-server-nodemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 301m 20s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/5/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5672 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux f97a0dc1cddf 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2b07c36bcc555cff069cccfaec6d8bed2176f595 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/5/testReport/ |
   | Max. process+thread count | 858 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] goiri merged pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "goiri (via GitHub)" <gi...@apache.org>.
goiri merged PR #5672:
URL: https://github.com/apache/hadoop/pull/5672


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1556951749

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |  19m 13s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 40s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   6m 51s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 21s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   3m  3s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   7m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 14s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 25s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 22s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   9m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 50s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   6m 50s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m  6s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 55s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   8m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 20s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  3s |  |  hadoop-yarn-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   5m 21s |  |  hadoop-yarn-common in the patch passed.  |
   | +1 :green_heart: |  unit  | 101m 26s |  |  hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  unit  |  23m 37s |  |  hadoop-yarn-server-nodemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 313m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/4/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5672 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux de1355214f79 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e48660b8c4a03dcb4c9f2b68d16efc2bf02a876f |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/4/testReport/ |
   | Max. process+thread count | 939 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1568669298

   @goiri Thank you very much for your help in reviewing the code!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1567291934

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m  4s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  2s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  2s |  |  detect-secrets was not available.  |
   | +0 :ok: |  xmllint  |   0m  2s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 42s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m  4s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 40s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   6m 56s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 24s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   3m  2s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   7m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 57s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m  5s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   7m  5s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 46s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   6m 46s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m  3s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 49s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   7m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 38s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  2s |  |  hadoop-yarn-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   5m 22s |  |  hadoop-yarn-common in the patch passed.  |
   | +1 :green_heart: |  unit  | 103m  7s |  |  hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  unit  |  23m 39s |  |  hadoop-yarn-server-nodemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 295m 20s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/6/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5672 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 0c65b57654b7 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e945c0b317ee7fd51fab7966fcfa39cf5a7caf8f |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/6/testReport/ |
   | Max. process+thread count | 918 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/6/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] goiri commented on a diff in pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "goiri (via GitHub)" <gi...@apache.org>.
goiri commented on code in PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#discussion_r1198012980


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java:
##########
@@ -703,6 +703,24 @@ protected static void validateConfigs(Configuration conf) {
           + ", " + YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS + "="
           + heartbeatIntvl);
     }
+
+    if (HAUtil.isFederationEnabled(conf)) {
+      /*
+       * In Yarn Federation, we need UAMs in secondary sub-clusters to stay
+       * alive when the next attempt AM in home sub-cluster gets launched. If
+       * the previous AM died because the node is lost after NM timeout. It will
+       * already be too late if AM timeout is even shorter.
+       */
+      int amExpireIntvl = conf.getInt(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS,

Review Comment:
   Can we switch to getTimeDuration() or would that break backwards compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1553566824

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 57s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  28m 36s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 53s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 35s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   6m 51s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 37s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   5m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 49s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m  4s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   7m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 47s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   6m 47s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 41s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 22s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 13s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   6m  8s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m  7s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  3s |  |  hadoop-yarn-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   5m 21s |  |  hadoop-yarn-common in the patch passed.  |
   | +1 :green_heart: |  unit  | 101m 34s |  |  hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 274m 14s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5672 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 3f1b37534478 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 1171876e239e82a97665881fda3800e694d0bcdb |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/1/testReport/ |
   | Max. process+thread count | 939 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1555201867

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 32s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   6m 44s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 36s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 22s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   5m 55s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m  8s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 56s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   6m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 41s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   6m 41s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 20s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   6m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 33s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  3s |  |  hadoop-yarn-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   5m 20s |  |  hadoop-yarn-common in the patch passed.  |
   | +1 :green_heart: |  unit  | 101m 28s |  |  hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 262m 22s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/3/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5672 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 2fef0dde8e6f 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 830d7af6ef27524f201f1581cc17e5508146acc7 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/3/testReport/ |
   | Max. process+thread count | 899 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1553286160

   @goiri Can you help review this pr? Thank you very much! The description information in JIRA is clear, and we can extend the timeout of RMAM appropriately. 
   
   We can refer to the following:
   
   ```
   In Federation, multiple attempts of an application share the same UAM in each secondary sub-cluster. When first attempt fails, we reply on the fact that secondary RM won't kill the existing UAM before the AM heartbeat timeout (default at 10 min). When second attempt comes up in the home sub-cluster, it will pick up the UAM token from Yarn Registry and resume the UAM heartbeat to secondary RMs.
   
   The default heartbeat timeout for NM and AM are both 10 mins. The problem is that when the first attempt node goes down or out of connection, only after 10 mins will the home RM mark the first attempt as failed, and then schedule the 2nd attempt in some other node. By then the UAMs in secondaries are already timing out, and they might not survive until the second attempt comes up.
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] slfan1989 commented on a diff in pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on code in PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#discussion_r1198545649


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java:
##########
@@ -703,6 +703,24 @@ protected static void validateConfigs(Configuration conf) {
           + ", " + YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS + "="
           + heartbeatIntvl);
     }
+
+    if (HAUtil.isFederationEnabled(conf)) {
+      /*
+       * In Yarn Federation, we need UAMs in secondary sub-clusters to stay
+       * alive when the next attempt AM in home sub-cluster gets launched. If
+       * the previous AM died because the node is lost after NM timeout. It will
+       * already be too late if AM timeout is even shorter.
+       */
+      int amExpireIntvl = conf.getInt(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS,

Review Comment:
   Thank you very much for helping to review the code, I will modify the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5672: YARN-7720. Race condition between second app attempt and UAM timeout when first attempt node is down.

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5672:
URL: https://github.com/apache/hadoop/pull/5672#issuecomment-1555055872

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m  9s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 56s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 45s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 40s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   6m 53s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   1m 49s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 40s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   5m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 57s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 25s |  |  Maven dependency ordering for patch  |
   | -1 :x: |  mvninstall  |   0m 29s | [/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) |  hadoop-yarn-server-resourcemanager in the patch failed.  |
   | -1 :x: |  compile  |   2m 34s | [/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) |  hadoop-yarn in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.  |
   | -1 :x: |  javac  |   2m 34s | [/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) |  hadoop-yarn in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.  |
   | -1 :x: |  compile  |   2m 16s | [/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) |  hadoop-yarn in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.  |
   | -1 :x: |  javac  |   2m 16s | [/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) |  hadoop-yarn in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 27s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   0m 32s | [/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) |  hadoop-yarn-server-resourcemanager in the patch failed.  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | -1 :x: |  spotbugs  |   0m 34s | [/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) |  hadoop-yarn-server-resourcemanager in the patch failed.  |
   | -1 :x: |  shadedclient  |  10m 12s |  |  patch has errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 57s |  |  hadoop-yarn-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   5m 46s |  |  hadoop-yarn-common in the patch passed.  |
   | -1 :x: |  unit  |   0m 40s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) |  hadoop-yarn-server-resourcemanager in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 135m 14s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5672 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 23c90b395949 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 1021ce97384b6fb982705c49e606bb1728adcd6e |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/testReport/ |
   | Max. process+thread count | 531 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5672/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org