You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2020/11/18 16:35:00 UTC

[jira] [Commented] (YARN-8737) Race condition in ParentQueue when reinitializing and sorting child queues in the meanwhile

    [ https://issues.apache.org/jira/browse/YARN-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234790#comment-17234790 ] 

Hadoop QA commented on YARN-8737:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 40s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  2s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |  | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} |  | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 48s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  8s{color} |  | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 51s{color} |  | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 38s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 56s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 50s{color} |  | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 46s{color} |  | {color:green} trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 42s{color} |  | {color:green} trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m  6s{color} |  | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  3s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 59s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  0s{color} |  | {color:green} the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  0s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 51s{color} |  | {color:green} the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 51s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 35s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 53s{color} |  | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 19m 18s{color} | [/patch-shadedclient.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/artifact/out/patch-shadedclient.txt] | {color:red} patch has errors when building and testing our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 26s{color} | [/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/artifact/out/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt] | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 24s{color} | [/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/artifact/out/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt] | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 25s{color} | [/patch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/artifact/out/patch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt] | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 26s{color} | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt] | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue}  0m 26s{color} |  | {color:blue} ASF License check generated no output? {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m  3s{color} |  | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/artifact/out/Dockerfile |
| JIRA Issue | YARN-8737 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12937848/YARN-8737.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle codespell |
| uname | Linux 235127d0304a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / e3c08f285a6ac02f51ddf0007db76fc3da7ec372 |
| Default Java | Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 |
|  Test Results | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/testReport/ |
| Max. process+thread count | 536 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/312/console |
| versions | git=2.17.1 maven=3.6.0 findbugs=4.0.6 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Race condition in ParentQueue when reinitializing and sorting child queues in the meanwhile
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-8737
>                 URL: https://issues.apache.org/jira/browse/YARN-8737
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.3.0, 2.9.3, 3.2.2, 3.1.4
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>         Attachments: YARN-8737.001.patch
>
>
> Administrator raised a update for queues through REST API, in RM parent queue is refreshing child queues through calling ParentQueue#reinitialize, meanwhile, async-schedule threads is sorting child queues when calling ParentQueue#sortAndGetChildrenAllocationIterator. Race condition may happen and throw exception as follow because TimSort does not handle the concurrent modification of objects it is sorting:
> {noformat}
> java.lang.IllegalArgumentException: Comparison method violates its general contract!
>         at java.util.TimSort.mergeHi(TimSort.java:899)
>         at java.util.TimSort.mergeAt(TimSort.java:516)
>         at java.util.TimSort.mergeCollapse(TimSort.java:441)
>         at java.util.TimSort.sort(TimSort.java:245)
>         at java.util.Arrays.sort(Arrays.java:1512)
>         at java.util.ArrayList.sort(ArrayList.java:1454)
>         at java.util.Collections.sort(Collections.java:175)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:291)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:804)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:817)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:636)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2494)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2431)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:2588)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:2676)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:927)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:962)
> {noformat}
> I think we can add read-lock for ParentQueue#sortAndGetChildrenAllocationIterator to solve this problem, the write-lock will be hold when updating child queues in ParentQueue#reinitialize.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org