You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/07 14:23:54 UTC

[GitHub] [spark] Ngone51 opened a new pull request #24007: [SPARK-23433][SPARK-25250] [CORE][BRANCH-2.3] Later created TaskSet should learn about the finished partitions

Ngone51 opened a new pull request #24007: [SPARK-23433][SPARK-25250] [CORE][BRANCH-2.3] Later created TaskSet should learn about the finished partitions
URL: https://github.com/apache/spark/pull/24007
 
 
   ## What changes were proposed in this pull request?
   
   This is an optional solution for #22806 .
   
   #21131 firstly implement that a previous successful completed task from zombie TaskSetManager could also succeed the active TaskSetManager, which based on an assumption that an active TaskSetManager always exists for that stage when this happen. But that's not always true as an active TaskSetManager may haven't been created when a previous task succeed, and this is the reason why #22806 hit the issue.
   
   This pr extends #21131 's behavior by adding stageIdToFinishedPartitions into TaskSchedulerImpl, which recording the finished partition whenever a task(from zombie or active) succeed. Thus, a later created active TaskSetManager could also learn about the finished partition by looking into stageIdToFinishedPartitions and won't launch any duplicate tasks.
   
   ## How was this patch tested?
   
   Add.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org