You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liutang123 <gi...@git.apache.org> on 2018/08/23 13:43:17 UTC
[GitHub] spark pull request #22202: [SPARK-25211][Core] speculation and fetch failed ...
GitHub user liutang123 opened a pull request:
https://github.com/apache/spark/pull/22202
[SPARK-25211][Core] speculation and fetch failed result in hang of job
## What changes were proposed in this pull request?
In current `DAGScheduler.handleTaskCompletion` code, when a shuffleMapStage with job not in runningStages and its `pendingPartitions` is empty, the job of this shuffleMapStage will never complete.
*Think about below*
1. Stage 0 runs and generates shuffle output data.
2. Stage 1 reads the output from stage 0 and generates more shuffle data. It has two tasks with the same partition: ShuffleMapTask0 and ShuffleMapTask0.1.
3. ShuffleMapTask0 fails to fetch blocks and sends a FetchFailed to the driver. The driver resubmits stage 0 and stage 1. The driver will place stage 0 in runningStages and place stage 1 in waitingStages.
4. ShuffleMapTask0.1 successfully finishes and sends Success back to driver. The driver will add the mapstatus to the set of output locations of stage 1. because of stage 1 not in runningStages, the job will not complete.
5. stage 0 completes and the driver will run stage 1. But, because the output sets of stage 1 is complete, the drive will not submit any tasks and make stage 1 complte right now. Because the job complete relay on the `CompletionEvent` and there will never a `CompletionEvent` come, the job will hang.
## How was this patch tested?
UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liutang123/spark SPARK-25211
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22202.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22202
----
commit 4f51199daafec0466a5ac836c4f6281f5ba45381
Author: liulijia <li...@...>
Date: 2018-08-23T13:42:13Z
[SPARK-25211][Core] speculation and fetch failed result in hang of job
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/22202
@Ngone51 Because some shuffleMapStage has mapStageJobs(JobWaiter) by `SparkContext.submitMapStage`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by Ngone51 <gi...@git.apache.org>.
Github user Ngone51 commented on the issue:
https://github.com/apache/spark/pull/22202
Since `stage 1` is only a `ShuffleMapStage`, so, why there're no other child stages to be submitted ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/22202
Thanks for ping~
Seems that `ShuffleMapTask0.1` is a speculation, please update the description.
The change seems fine for me. But give https://github.com/apache/spark/pull/21019, the issue in description is already solved. I think this change is a refine work for https://github.com/apache/spark/pull/21019. Fine for me. But we should always be careful when touching such core logic
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22202: [SPARK-25211][Core] speculation and fetch failed ...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22202#discussion_r212365264
--- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2246,58 +2247,6 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi
assertDataStructuresEmpty()
}
- test("Trigger mapstage's job listener in submitMissingTasks") {
--- End diff --
Could you give some explain for deleting this test?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22202: [SPARK-25211][Core] speculation and fetch failed ...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22202#discussion_r212601673
--- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2246,58 +2247,6 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi
assertDataStructuresEmpty()
}
- test("Trigger mapstage's job listener in submitMissingTasks") {
--- End diff --
Because that PR is conflict with this PR.
In that PR, shuffleMapStage waits the completion of parent stages's rerun.
In this PR, shuffleMapStage completes immediately when all partitions are ready.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/22202
@jinxing64 Do you have any idea?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22202
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org