You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/16 07:22:33 UTC

[GitHub] [spark] cloud-fan edited a comment on issue #24375: [SPARK-27474][CORE] try best to not submit tasks when the partitions are already completed

cloud-fan edited a comment on issue #24375: [SPARK-27474][CORE] try best to not submit tasks when the partitions are already completed
URL: https://github.com/apache/spark/pull/24375#issuecomment-483513266
 
 
   I think we are discussing the optimization(saving resource) instead of bug? Nothing will go wrong even without #21131
   
   UPDATE:
   For normal tasks, they can all complete even if they belong to the same partition. So it's just a matter of saving resource by avoiding submitting tasks whose corresponding partitions are already marked as completed. 
   
   For tasks that write to file sources, which need to commit to the central coordinator, only one task can complete for one partition. In this case, if a task from zombie TSM completes first, then the corresponding task in the active TSM will fail and get re-tried, and fail again, until the stage attempt is aborted. Then a new stage attempt will be created. The job doesn't fail, but the resource is wasted a lot.
   
   If the task from the active TSM completes first, then the corresponding task from the zombie TSM will fail. This is totally fine, as zombie TSM does not retry tasks.
   
   That said, this PR tries to avoid the worst case described above. Even if we go through the event loop now, I don't think it will take a very long time that the task from the active TSM have already re-tried 3 times.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org