You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/11 06:01:12 UTC

[GitHub] [spark] mridulm opened a new pull request, #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

mridulm opened a new pull request, #38617:
URL: https://github.com/apache/spark/pull/38617

   ### What changes were proposed in this pull request?
   Fix flakey test failure
   
   ### Why are the changes needed?
   MT-safety issue in test
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Local tests.
   Need to validate on CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311311848

   Can you merge this if the tests pass @HyukjinKwon ? I might not be online tomorrow and it is getting late tonight for me :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311267937

   +CC @HyukjinKwon, @LuciferYang, @wankunde 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311547677

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311717433

   Thanks @HyukjinKwon , @LuciferYang  !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019860399


##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -4559,8 +4563,8 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti
       sendRequestsLatch.await()
       verify(blockStoreClient, times(2))
         .finalizeShuffleMerge(any(), any(), any(), any(), any())
-      assert(sentHosts.nonEmpty)
-      assert(sentHosts.head === "hostB" && sentHosts.length == 1)
+      assert(!sentHosts.isEmpty)
+      assert(1 == sentHosts.size() && sentHosts.iterator().next() === "hostB")

Review Comment:
   We are updating and querying in two different threads - make `sendHosts` thread safe



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019866602


##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -4559,8 +4564,8 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti
       sendRequestsLatch.await()
       verify(blockStoreClient, times(2))
         .finalizeShuffleMerge(any(), any(), any(), any(), any())
-      assert(sentHosts.nonEmpty)
-      assert(sentHosts.head === "hostB" && sentHosts.length == 1)
+      assert(1 == sentHosts.size())
+      assert(sentHosts.asScala.toSeq === Seq("hostB"))

Review Comment:
   In the test failures seen earlier, `verify(blockStoreClient, times(2))` was satisfied. The reason why `sentHosts` was empty is due to the possible race in update between `sendRequestsLatch` and `sentHosts`, and also because `sentHosts` was not thread safe (so `add` to it need not to show up in this thread)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019859895


##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -4533,16 +4533,20 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti
       blockStoreClientField.setAccessible(true)
       blockStoreClientField.set(sc.env.blockManager, blockStoreClient)
 
-      val sentHosts = ArrayBuffer[String]()
+      val sentHosts = JCollections.synchronizedList(new JArrayList[String]())
       var hostAInterrupted = false
       doAnswer { (invoke: InvocationOnMock) =>
         val host = invoke.getArgument[String](0)
-        sendRequestsLatch.countDown()
         try {
           if (host == "hostA") {
-            canSendRequestLatch.await(timeoutSecs * 2, TimeUnit.SECONDS)
+            sendRequestsLatch.countDown()
+            canSendRequestLatch.await(timeoutSecs * 5, TimeUnit.SECONDS)
+            // should not reach here, will get interrupted by DAGScheduler
+            sentHosts.add(host)
+          } else {
+            sentHosts.add(host)
+            sendRequestsLatch.countDown()

Review Comment:
   Do countdown after adding to sentHosts - to prevent race



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019860144


##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -4533,16 +4533,20 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti
       blockStoreClientField.setAccessible(true)
       blockStoreClientField.set(sc.env.blockManager, blockStoreClient)
 
-      val sentHosts = ArrayBuffer[String]()
+      val sentHosts = JCollections.synchronizedList(new JArrayList[String]())
       var hostAInterrupted = false
       doAnswer { (invoke: InvocationOnMock) =>
         val host = invoke.getArgument[String](0)
-        sendRequestsLatch.countDown()
         try {
           if (host == "hostA") {
-            canSendRequestLatch.await(timeoutSecs * 2, TimeUnit.SECONDS)
+            sendRequestsLatch.countDown()
+            canSendRequestLatch.await(timeoutSecs * 5, TimeUnit.SECONDS)

Review Comment:
   bumped up the timeout - in case CI machines are slower



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case
URL: https://github.com/apache/spark/pull/38617


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019866602


##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -4559,8 +4564,8 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti
       sendRequestsLatch.await()
       verify(blockStoreClient, times(2))
         .finalizeShuffleMerge(any(), any(), any(), any(), any())
-      assert(sentHosts.nonEmpty)
-      assert(sentHosts.head === "hostB" && sentHosts.length == 1)
+      assert(1 == sentHosts.size())
+      assert(sentHosts.asScala.toSeq === Seq("hostB"))

Review Comment:
   In the test failures seen earlier, `verify(blockStoreClient, times(2))` was satisfied. The reason why `sentHosts` was empty is due to the race in update between `sendRequestsLatch` and `sentHosts`, and also because `sentHosts` was not thread safe (so `add` to it need to show in this thread)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #38617: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

Posted by GitBox <gi...@apache.org>.
mridulm commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311274093

   I am still not able to reproduce this locally - but logically, this looks like the right fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org