You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cxzl25 <gi...@git.apache.org> on 2018/06/28 07:47:04 UTC

[GitHub] spark pull request #21656: [SPARK-24677][Core]MedianHeap is empty when specu...

GitHub user cxzl25 opened a pull request:

    https://github.com/apache/spark/pull/21656

    [SPARK-24677][Core]MedianHeap is empty when speculation is enabled, causing the SparkContext to stop

    ## What changes were proposed in this pull request?
    When speculation is enabled, 
    TaskSetManager#markPartitionCompleted should write successful task duration to MedianHeap, 
    not just increase tasksSuccessful.
    
    Otherwise when TaskSetManager#checkSpeculatableTasks,tasksSuccessful non-zero, but MedianHeap is empty.
    Then throw an exception successfulTaskDurations.median java.util.NoSuchElementException: MedianHeap is empty.
    Finally led to stopping SparkContext.
    ## How was this patch tested?
    manual tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cxzl25/spark fix_MedianHeap_empty

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21656
    
----
commit 467f0bccb7d1940bed4f1b2e633c9374b0e654f2
Author: sychen <sy...@...>
Date:   2018-06-28T07:34:38Z

    MedianHeap is empty when speculation is enabled, causing the SparkContext to stop.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    **[Test build #92631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92631/testReport)** for PR 21656 at commit [`55ddbeb`](https://github.com/apache/spark/commit/55ddbeb26085c9d8cd9c1768479d9b9acdacda2b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92820/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Ok what did we decide on the time then?  I would say for now either ignore or send down the real time.
    
    @cxzl25 How hard is it to send the actual task info into here so it could use the real time the successful task took?  At a glance it doesn't look to hard to add in the additional information into the function calls to pass it into markPartitionCompleted


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    +1 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21656: [SPARK-24677][Core]MedianHeap is empty when specu...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21656#discussion_r200236204
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -772,6 +772,12 @@ private[spark] class TaskSetManager(
       private[scheduler] def markPartitionCompleted(partitionId: Int): Unit = {
         partitionToIndex.get(partitionId).foreach { index =>
           if (!successful(index)) {
    +        if (speculationEnabled) {
    +          taskAttempts(index).headOption.map { info =>
    +            info.markFinished(TaskState.FINISHED, clock.getTimeMillis())
    +            successfulTaskDurations.insert(info.duration)
    --- End diff --
    
    TaskSetManager#handleSuccessfulTask update successful task durations, and write to successfulTaskDurations.
    
    When there are multiple tasksets for this stage, markPartitionCompletedInAllTaskSets is
    accumulate the value of tasksSuccessful.
    
    In this case, when checkSpeculatableTasks is called, the value of tasksSuccessful matches the condition, but successfulTaskDurations is empty.
    
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L723
    ```scala
      def handleSuccessfulTask(tid: Long, result: DirectTaskResult[_]): Unit = {
        val info = taskInfos(tid)
        val index = info.index
        info.markFinished(TaskState.FINISHED, clock.getTimeMillis())
        if (speculationEnabled) {
          successfulTaskDurations.insert(info.duration)
        }
       // ...
       // There may be multiple tasksets for this stage -- we let all of them know that the partition
       // was completed.  This may result in some of the tasksets getting completed.
        sched.markPartitionCompletedInAllTaskSets(stageId, tasks(index).partitionId)
    ```
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L987
    ```scala
    override def checkSpeculatableTasks(minTimeToSpeculation: Int): Boolean = {
    //...
      if (tasksSuccessful >= minFinishedForSpeculation && tasksSuccessful > 0) {
          val time = clock.getTimeMillis()
          val medianDuration = successfulTaskDurations.median
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    I assume this is really that it isn't updating successfulTaskDurations?  MedianHeap is a collection, can you please update description and title to be more explicit



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    lgtm


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    **[Test build #92820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92820/testReport)** for PR 21656 at commit [`1c1df5c`](https://github.com/apache/spark/commit/1c1df5c4e421eed08fe31426400da6148f9c1842).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    @tgravescs 
    This is really not difficult.
    I'm just not sure if we want to ignore or send down the real time.
    Now I have submitted a change, use actual time of successful task.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    @maropu @cloud-fan @squito 
    Can you trigger a test for this?
    This is the exception stack in the log:
    ```
    ERROR Utils: uncaught error in thread task-scheduler-speculation, stopping SparkContext
    java.util.NoSuchElementException: MedianHeap is empty.
    at org.apache.spark.util.collection.MedianHeap.median(MedianHeap.scala:83)
    at org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks(TaskSetManager.scala:968)
    at org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:94)
    at org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:93)
    at scala.collection.Iterator$class.foreach(Iterator.scala:742)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at org.apache.spark.scheduler.Pool.checkSpeculatableTasks(Pool.scala:93)
    at org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:94)
    at org.apache.spark.scheduler.Pool$$anonfun$checkSpeculatableTasks$1.apply(Pool.scala:93)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21656: [SPARK-24677][Core]MedianHeap is empty when specu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21656#discussion_r200231460
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -772,6 +772,12 @@ private[spark] class TaskSetManager(
       private[scheduler] def markPartitionCompleted(partitionId: Int): Unit = {
         partitionToIndex.get(partitionId).foreach { index =>
           if (!successful(index)) {
    +        if (speculationEnabled) {
    +          taskAttempts(index).headOption.map { info =>
    +            info.markFinished(TaskState.FINISHED, clock.getTimeMillis())
    +            successfulTaskDurations.insert(info.duration)
    --- End diff --
    
    what's the normal code path to update task durations?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    The changes LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    **[Test build #92817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92817/testReport)** for PR 21656 at commit [`d8fdceb`](https://github.com/apache/spark/commit/d8fdceb1ac86746c9c99b2dde833a62007a30f60).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Thanks for finding this and suggesting a fix @cxzl25.  But, I'm not sure it makes sense to use this duration.  its not how long the task actually took to complete.  I think it might make more sense to just ignore this task for speculation.  I will think about it some more.
    
    cc @markhamstra @tgravescs 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21656
  
     In this case one of the older stage attempts (that is a zombie) marked the task as successful but then the newest stage attempt checked to see if it needed to speculate. Is that correct?
    
    Ideally I think for speculation we want to look at the task time for all stage attempts. But that is probably a bigger change then this.  If we aren't doing that then I think ignoring it for speculation is ok.   Otherwise how hard is it to send the actual task info into here so it could use the real time the successful task took?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21656: [SPARK-24677][Core]Avoid NoSuchElementException f...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21656


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92631/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21656: [SPARK-24677][Core]MedianHeap is empty when specu...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21656#discussion_r200759539
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -772,6 +772,12 @@ private[spark] class TaskSetManager(
       private[scheduler] def markPartitionCompleted(partitionId: Int): Unit = {
         partitionToIndex.get(partitionId).foreach { index =>
           if (!successful(index)) {
    +        if (speculationEnabled) {
    --- End diff --
    
    yeah that is sort of what I was suggesting -- but I was thinking rather than just a flag, maybe we separate out `tasksSuccessful` into `tasksCompletedSuccessfully` (from this taskset) and `tasksNoLongerNecessary` (from any taskset), perhaps with better names.  If you just had a flag, you would avoid the exception from the empty heap, but you still might decide to enable speculation prematurely as you really haven't finished enough for `SPECULATION_QUANTILE`: https://github.com/apache/spark/blob/a381bce7285ec30f58f28f523dfcfe0c13221bbf/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L987


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    **[Test build #92817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92817/testReport)** for PR 21656 at commit [`d8fdceb`](https://github.com/apache/spark/commit/d8fdceb1ac86746c9c99b2dde833a62007a30f60).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92817/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    **[Test build #92631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92631/testReport)** for PR 21656 at commit [`55ddbeb`](https://github.com/apache/spark/commit/55ddbeb26085c9d8cd9c1768479d9b9acdacda2b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    **[Test build #92820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92820/testReport)** for PR 21656 at commit [`1c1df5c`](https://github.com/apache/spark/commit/1c1df5c4e421eed08fe31426400da6148f9c1842).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    @squito  are you ok with this approach?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21656: [SPARK-24677][Core]MedianHeap is empty when specu...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21656#discussion_r200804756
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -772,6 +772,12 @@ private[spark] class TaskSetManager(
       private[scheduler] def markPartitionCompleted(partitionId: Int): Unit = {
         partitionToIndex.get(partitionId).foreach { index =>
           if (!successful(index)) {
    +        if (speculationEnabled) {
    --- End diff --
    
    `speculationEnabled && ! isZombie`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    @maropu 
    I have added a unit test.
    Can you trigger a test for this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    cc @jiangxb1987 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    Can you add a test to check if no exception thrown in that condition with this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21656: [SPARK-24677][Core]MedianHeap is empty when specu...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21656#discussion_r200366359
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -772,6 +772,12 @@ private[spark] class TaskSetManager(
       private[scheduler] def markPartitionCompleted(partitionId: Int): Unit = {
         partitionToIndex.get(partitionId).foreach { index =>
           if (!successful(index)) {
    +        if (speculationEnabled) {
    --- End diff --
    
    IIUC in this case no task in this taskSet actually successfully finishes, it's another task attempt from another taskSet for the same stage that succeeded. In stead of changing this code path, I'd suggest we have another flag to show whether any task succeeded in current taskSet, and if no task have succeeded, skip L987.
    
    WDYT @squito ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    merged thanks @cxzl25 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21656: [SPARK-24677][Core]MedianHeap is empty when speculation ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21656
  
    > Ideally I think for speculation we want to look at the task time for all stage attempts. But that is probably a bigger change then this
    
    yeah I agree, on both points.  One thing which is a little tricky is that you probably want to make sure you're only counting times from different partitions -- you might times from the same partition from multiple attempts, but that shouldn't count.  (or maybe we don't really care that much as its just a heuristic anyway ...)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org