You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/06/17 17:02:36 UTC
[spark] branch branch-2.4 updated: [SPARK-32000][2.4][CORE][TESTS] Fix the flaky test for partially launched task in barrier-mode

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 23ff9e6  [SPARK-32000][2.4][CORE][TESTS] Fix the flaky test for partially launched task in barrier-mode
23ff9e6 is described below

commit 23ff9e6a13d7f2e14f2b154a9dabfc7cdca430a4
Author: yi.wu <yi...@databricks.com>
AuthorDate: Wed Jun 17 09:59:14 2020 -0700

    [SPARK-32000][2.4][CORE][TESTS] Fix the flaky test for partially launched task in barrier-mode
    
    ### What changes were proposed in this pull request?
    
    This PR changes the test to get an active executorId and set it as preferred location instead of setting a fixed preferred location.
    
    ### Why are the changes needed?
    
    The test is flaky. After checking the [log](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124086/artifact/core/), I find the root cause is:
    
    Two test cases from different test suites got submitted at the same time because of concurrent execution. In this particular case, the two test cases (from DistributedSuite and BarrierTaskContextSuite) both launch under local-cluster mode. The two applications are submitted at the SAME time so they have the same applications(app-20200615210132-0000). Thus, when the cluster of BarrierTaskContextSuite is launching executors, it failed to create the directory for the executor 0, because  [...]
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    The test can not be reproduced locally. We can only know it's been fixed when it's no longer flaky on Jenkins.
    
    Closes #28851 from Ngone51/fix-spark-32000-24.
    
    Authored-by: yi.wu <yi...@databricks.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala  | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 469cc4a..92a97d1 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -159,11 +159,13 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext {
       .setAppName("test-cluster")
       .set("spark.test.noStageRetry", "true")
     sc = new SparkContext(conf)
+    TestUtils.waitUntilExecutorsUp(sc, 2, 6000)
+    val id = sc.getExecutorIds().head
     val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
     val dep = new OneToOneDependency[Int](rdd0)
-    // set up a barrier stage with 2 tasks and both tasks prefer executor 0 (only 1 core) for
+    // set up a barrier stage with 2 tasks and both tasks prefer the same executor (only 1 core) for
     // scheduling. So, one of tasks won't be scheduled in one round of resource offer.
-    val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq("executor_h_0"), Seq("executor_h_0")))
+    val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq(s"executor_h_$id"), Seq(s"executor_h_$id")))
     val errorMsg = intercept[SparkException] {
       rdd.barrier().mapPartitions { iter =>
         BarrierTaskContext.get().barrier()


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org