You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2014/04/24 01:08:23 UTC

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/516

    [Hot Fix #469] Fix flaky test in SparkListenerSuite

    The two modified tests may fail if the race condition does not bid in our favor...

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark stage-info-test-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #516
    
----
commit b4b6100b7992f69528003e7fd2a892d3fef77b43
Author: Andrew Or <an...@gmail.com>
Date:   2014-04-23T23:04:49Z

    Add/replace missing waitUntilEmpty() calls to listener bus

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/516#discussion_r11936481
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
    @@ -177,6 +177,7 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with ShouldMatc
         listener.stageInfos.clear()
     
         rdd3.count()
    +    assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
    --- End diff --
    
    Hm, looks like `LinkedBlockingQueue`'s take is special in that it waits for the next item to be ready. It seems there isn't an equivalent for peek... we may have to synchronize some other way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/516#discussion_r11936322
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
    @@ -177,6 +177,7 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with ShouldMatc
         listener.stageInfos.clear()
     
         rdd3.count()
    +    assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
    --- End diff --
    
    Good catch. A solution is to have `eventQueue` peek instead of take in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L43, such that we remove the event from the queue only after all listeners have finished processing it.
    
    Many other places have long been relying on `waitUntilEmpty` even before this PR. It makes me wonder how likely this race condition actually causes a test failure...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/516#discussion_r11979334
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
    @@ -177,6 +177,7 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with ShouldMatc
         listener.stageInfos.clear()
     
         rdd3.count()
    +    assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
    --- End diff --
    
    @zsxwing we have to keep `waitUntilEmpty` in LiveListenerBus.scala (unfortunately) because DAGSchedulerSuite also uses it. I started a similar solution at #544. Maybe we can move the discussion there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/516#discussion_r11936165
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
    @@ -177,6 +177,7 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with ShouldMatc
         listener.stageInfos.clear()
     
         rdd3.count()
    +    assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
    --- End diff --
    
    Just realize that `waitUntilEmpty` is not enough. In `waitUntilEmpty`, it only checks `eventQueue.isEmpty`. But when `eventQueue.isEmpty` is true, there is still a chance that listeners do not finish their jobs or the memory is not synchronized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/516#issuecomment-41228187
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/516


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/516#issuecomment-41230250
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/516#discussion_r11936152
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
    @@ -50,9 +50,9 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with ShouldMatc
         (1 to 5).foreach { _ => bus.post(SparkListenerJobEnd(0, JobSucceeded)) }
         assert(counter.count === 0)
     
    -    // Starting listener bus should flush all buffered events (asynchronously, hence the sleep)
    +    // Starting listener bus should flush all buffered events
         bus.start()
    -    Thread.sleep(1000)
    +    assert(bus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
    --- End diff --
    
    Just realize that `waitUntilEmpty` is not enough. In `waitUntilEmpty`, it only checks `eventQueue.isEmpty`. But when `eventQueue.isEmpty` is true, there is still a chance that listeners do not finish their jobs or the memory is not synchronized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/516#discussion_r11936630
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
    @@ -177,6 +177,7 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with ShouldMatc
         listener.stageInfos.clear()
     
         rdd3.count()
    +    assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
    --- End diff --
    
    Since `waitUntilEmpty` is only for test, is it possible that moving such `wait` logic to the SparkListener instances used in tests? E.g.,
    
    ```Scala
      import java.util.concurrent.{CountDownLatch, TimeUnit}
    
      class SaveStageAndTaskInfo extends SparkListener {
        val stageInfos = mutable.Map[StageInfo, Seq[(TaskInfo, TaskMetrics)]]()
        var taskInfoMetrics = mutable.Buffer[(TaskInfo, TaskMetrics)]()
        val latch = new CountDownLatch(1)
    
        override def onTaskEnd(task: SparkListenerTaskEnd) {
          val info = task.taskInfo
          val metrics = task.taskMetrics
          if (info != null && metrics != null) {
            taskInfoMetrics += ((info, metrics))
          }
        }
    
        override def onStageCompleted(stage: SparkListenerStageCompleted) {
          stageInfos(stage.stageInfo) = taskInfoMetrics
          taskInfoMetrics = mutable.Buffer.empty
          latch.countDown()
        }
    
        def waitForCompleted(timeoutMillis: Long) {
          latch.await(timeoutMillis, TimeUnit.MILLISECONDS)
        }
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/516#issuecomment-41242784
  
    Thanks. I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/516#issuecomment-41226046
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/516#discussion_r11936200
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
    @@ -177,6 +177,7 @@ class SparkListenerSuite extends FunSuite with LocalSparkContext with ShouldMatc
         listener.stageInfos.clear()
     
         rdd3.count()
    +    assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
    --- End diff --
    
    cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Hot Fix #469] Fix flaky test in SparkListener...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/516#issuecomment-41230251
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14409/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---