You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2020/07/17 16:11:00 UTC

[jira] [Commented] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement

    [ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160030#comment-17160030 ] 

Hadoop QA commented on YARN-10352:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 32s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 47s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 122 unchanged - 0 fixed = 124 total (was 122) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  5s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 15s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 20s{color} | {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
|  |  Write to static field org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.printVerboseLogging from instance method org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodesHeartbeated(String)  At ClusterNodeTracker.java:from instance method org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodesHeartbeated(String)  At ClusterNodeTracker.java:[line 519] |
| Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerMultiNodesWithPreemption |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-YARN-Build/26281/artifact/out/Dockerfile |
| JIRA Issue | YARN-10352 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13007867/YARN-10352-001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle |
| uname | Linux af2b3155bc30 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 2ba44a73bf2 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/26281/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt |
| findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/26281/artifact/out/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html |
| unit | https://builds.apache.org/job/PreCommit-YARN-Build/26281/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt |
|  Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/26281/testReport/ |
| Max. process+thread count | 820 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/26281/console |
| versions | git=2.17.1 maven=3.6.0 findbugs=4.0.6 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Skip schedule on not heartbeated nodes in Multi Node Placement
> --------------------------------------------------------------
>
>                 Key: YARN-10352
>                 URL: https://issues.apache.org/jira/browse/YARN-10352
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.3.0, 3.4.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>              Labels: capacityscheduler, multi-node-placement
>         Attachments: YARN-10352-001.patch
>
>
> When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM Active Nodes will be still having those stopped nodes until NM Liveliness Monitor Expires after configured timeout (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, Multi Node Placement assigns the containers on those nodes. They need to exclude the nodes which has not heartbeated for configured heartbeat interval (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to Asynchronous Capacity Scheduler Threads. (CapacityScheduler#shouldSkipNodeSchedule)
> *Repro:*
> 1. Enable Multi Node Placement (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery Enabled  (yarn.node.recovery.enabled)
> 2. Have only one NM running say worker0
> 3. Stop worker0 and start any other NM say worker1
> 4. Submit a sleep job. The containers will timeout as assigned to stopped NM worker0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org