You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liutang123 <gi...@git.apache.org> on 2018/08/23 13:43:17 UTC

[GitHub] spark pull request #22202: [SPARK-25211][Core] speculation and fetch failed ...

GitHub user liutang123 opened a pull request:

    https://github.com/apache/spark/pull/22202

    [SPARK-25211][Core] speculation and fetch failed result in hang of job

    ## What changes were proposed in this pull request?
    
    In current `DAGScheduler.handleTaskCompletion` code, when a shuffleMapStage with job not in runningStages and its `pendingPartitions` is empty, the job of this shuffleMapStage will never complete.
    
    *Think about below*
    
    1. Stage 0 runs and generates shuffle output data.
    
    2. Stage 1 reads the output from stage 0 and generates more shuffle data. It has two tasks with the same partition: ShuffleMapTask0 and ShuffleMapTask0.1.
    
    3. ShuffleMapTask0 fails to fetch blocks and sends a FetchFailed to the driver. The driver resubmits stage 0 and stage 1. The driver will place stage 0 in runningStages and place stage 1 in waitingStages.
    
    4. ShuffleMapTask0.1 successfully finishes and sends Success back to driver. The driver will add the mapstatus to the set of output locations of stage 1. because of stage 1 not in runningStages, the job will not complete.
    
    5. stage 0 completes and the driver will run stage 1. But, because the output sets of stage 1 is complete, the drive will not submit any tasks and make stage 1 complte right now. Because the job complete relay on the `CompletionEvent` and there will never a `CompletionEvent` come, the job will hang.
    
    ## How was this patch tested?
    
    UT

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liutang123/spark SPARK-25211

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22202
    
----
commit 4f51199daafec0466a5ac836c4f6281f5ba45381
Author: liulijia <li...@...>
Date:   2018-08-23T13:42:13Z

    [SPARK-25211][Core] speculation and fetch failed result in hang of job

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    @Ngone51 Because some shuffleMapStage has mapStageJobs(JobWaiter) by `SparkContext.submitMapStage`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by Ngone51 <gi...@git.apache.org>.
Github user Ngone51 commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Since `stage 1` is only a `ShuffleMapStage`, so, why there're no other child stages to be submitted ?  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Thanks for ping~
    Seems that `ShuffleMapTask0.1` is a speculation, please update the description.
    The change seems fine for me. But give https://github.com/apache/spark/pull/21019, the issue in description is already solved. I think this change is a refine work for https://github.com/apache/spark/pull/21019. Fine for me. But we should always be careful when touching such core logic



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22202: [SPARK-25211][Core] speculation and fetch failed ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22202#discussion_r212365264
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
    @@ -2246,58 +2247,6 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi
         assertDataStructuresEmpty()
       }
     
    -  test("Trigger mapstage's job listener in submitMissingTasks") {
    --- End diff --
    
    Could you give some explain for deleting this test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22202: [SPARK-25211][Core] speculation and fetch failed ...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22202#discussion_r212601673
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
    @@ -2246,58 +2247,6 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi
         assertDataStructuresEmpty()
       }
     
    -  test("Trigger mapstage's job listener in submitMissingTasks") {
    --- End diff --
    
    Because that PR is conflict with this PR.
    In that PR, shuffleMapStage waits the completion of parent stages's rerun.
    In this PR, shuffleMapStage completes immediately when all partitions are ready.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    @jinxing64 Do you have any idea?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22202
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org