You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2021/03/05 00:52:00 UTC

[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky

    [ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295668#comment-17295668 ] 

Hadoop QA commented on YARN-10672:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 27s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 41s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  7s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m  6s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 51s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 46s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/artifact/out/whitespace-eol.txt{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 59s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 36s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}100m  9s{color} | {color:green}{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 27s{color} | {color:green}{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 26s{color} | {color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/artifact/out/Dockerfile |
| JIRA Issue | YARN-10672 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13021618/YARN-10672.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle |
| uname | Linux ae00b750594d 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 6699198b54b |
| Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
|  Test Results | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/testReport/ |
| Max. process+thread count | 840 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/console |
| versions | git=2.25.1 maven=3.6.3 findbugs=4.0.6 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> All testcases in TestReservations are flaky
> -------------------------------------------
>
>                 Key: YARN-10672
>                 URL: https://issues.apache.org/jira/browse/YARN-10672
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, YARN-10672-debuglogs.patch, YARN-10672.001.patch
>
>
> All testcases in TestReservations are flaky
> Running a particular test in TestReservations 100 times never passes all the time.
>  For example, let's run testReservationNoContinueLook 100 times. For me, it produced 39 failed and 61 passed results.
>  Sometimes just 1 out of 100 runs is failed.
>  Screenshot is attached.
> Stacktrace:
> {code:java}
> java.lang.AssertionError: 
> Expected :2048
> Actual   :0
> <Click to see difference>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642)
> {code}
> The test fails here:
> {code:java}
>  // Start testing...
> // Only AM
> TestUtils.applyResourceCommitRequest(clusterResource,
>     a.assignContainers(clusterResource, node_0,
>         new ResourceLimits(clusterResource),
>         SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
> assertEquals(2 * GB, a.getUsedResources().getMemorySize());
> {code}
> With some debugging (patch attached), I realized that sometimes there are no registered nodes so the AM can't be allocated and test will fail:
> {code:java}
> 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator (RegularContainerAllocator.java:canAssign(312)) - ******Can't assign container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f
> {code}
> In these cases, this is also printed from org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes:
> {code:java}
> 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real getNumClusterNodes
> {code}
> h2. Let's break this down:
>  1. The mocking happens in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration, boolean):
> {code:java}
> cs.setRMContext(spyRMContext);
> cs.init(csConf);
> cs.start();
> when(cs.getNumClusterNodes()).thenReturn(3);
> {code}
> Under no circumstances this could be allowed to return any other value than 3.
>  However, as mentioned above, sometimes the real method of 'getNumClusterNodes' is called on CapacityScheduler.
> 2. Sometimes, this gets printed to the console:
> {code:java}
> org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
> Integer cannot be returned by isMultiNodePlacementEnabled()
> isMultiNodePlacementEnabled() should return boolean
> ***
> If you're unsure why you're getting above error read on.
> Due to the nature of the syntax above problem might occur because:
> 1. This exception *might* occur in wrongly written multi-threaded tests.
>    Please refer to Mockito FAQ on limitations of concurrency testing.
> 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub spies - 
>    - with doReturn|Throw() family of methods. More in javadocs for Mockito.spy() method.
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566)
> 	at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> 	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
> 	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:40)
> 	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
> 	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
> {code}
> TestReservations.java:166 is exactly our suspicious stubbing call:
> {code:java}
> when(cs.getNumClusterNodes()).thenReturn(3);
> {code}
> But what the heck is the relation between this and the isMultiNodePlacementEnabled method?
>  Let's dive into this more.
> 3. The only caller of isMultiNodePlacementEnabled is org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager#dynamicallyUpdateAppActivitiesMaxQueueLengthIfNeeded, that gets called from the anonymous Thread instance (assigned to field 'cleanupThread') in class [ActivitiesManager|https://github.com/apache/hadoop/blob/6699198b54bf6360c164a6ce7552c8b91a318c59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java#L301-L357]
>  The ActivitiesManager.serviceStart is invoked when the CapacityScheduler is started.
> *Theory*: When CS is started [here|https://github.com/apache/hadoop/blob/6699198b54bf6360c164a6ce7552c8b91a318c59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java#L156-L160], the ActivitiesManager starts running and eventually cs.isMultiNodePlacementEnabled() will be called by the thread [here|https://github.com/apache/hadoop/blob/6699198b54bf6360c164a6ce7552c8b91a318c59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java#L266].
>  As the main thread (JUnit) started to stub method 'getNumClusterNodes', there's an ongoing stubbing so when isMultiNodePlacementEnabled is called, the Answer will hold the value of 3, which is improper for a boolean method, so the stubbing will fail.
>  This is a race-condition situation, it is presented with 2 screenshots.
>  As the stubbing failed, subsequent calls to getNumClusterNodes will return 0 as the original implementation returns 0 for the tests. This is the root cause of the test failures.
> 4. Some references to this mockito "issue":
>  - [https://github.com/mockito/mockito/issues/1151]
>  - [https://github.com/mockito/mockito/issues/1151#issuecomment-336718714]
>  - [https://github.com/mockito/mockito/pull/1200#issuecomment-336717814]
>  The last link points to the documentation that describes the thread-safety of mockito: [https://github.com/mockito/mockito/wiki/FAQ#is-mockito-thread-safe]
> *CONCLUSION & FIX*
>  As all tests in this class are affected by the setup method and this thread-safety issue, we need to fix it consistently.
>  Actually, the fix is quite simple: All the stub calls of the CS instance should happen before the CS.init / CS.start calls, that triggers the ActivitiesManager thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org