You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/15 13:09:22 UTC

[GitHub] [spark] cloud-fan opened a new pull request #24375: [SPARK-25250][CORE] try best to not submit tasks when the partitions are already completed

cloud-fan opened a new pull request #24375: [SPARK-25250][CORE] try best to not submit tasks when the partitions are already completed
URL: https://github.com/apache/spark/pull/24375
 
 
   ## What changes were proposed in this pull request?
   
   #21131 firstly implements that a previous successful completed task from zombie `TaskSetManager` could also succeed in the active `TaskSetManager`. Later #23871 improves the implementation to cover a corner case that, an active `TaskSetManager` hasn't been created when a previous task succeed.
   
   However, #23871 has a bug and was reverted in #24359.
   
   Look back to the original problem, there are 2 findings:
   1. The issue cannot be 100% eliminated. Let's say task set 1.0 (zombie) has a running task for a partition, and task set 1.1 (active) has already submitted the task for the same partition and completed. Then there is nothing we can do.
   2. The thing we care about is the task completion events from a zombie task set. If a task from the active task set completes, we don't need to mark the corresponding tasks from zombie task sets as completed.
   
   This PR proposes a new fix:
   1. When `DAGScheduler` gets a task success event from an earlier attempt, notify the `TaskSchedulerImpl` about it
   2. When `TaskSchedulerImpl` knows a partition is already completed, ask the active `TaskSetManager` to mark the corresponding task as finished, if the task is not finished yet.
   
   ## How was this patch tested?
   
   a new test case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org