You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/09 10:34:00 UTC

[GitHub] [spark] liupc commented on issue #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block

liupc commented on issue #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block
URL: https://github.com/apache/spark/pull/27604#issuecomment-596449418
 
 
   > Ok, I get your point now. Let me paraphrase it to see if I understand correctly:
   > 
   > Assuming we have stage0 finished while stage1 and stage2 are running concurrently and both depend on stage0.
   > 
   > Task from stage1 hit `FetchFailedException` and causes stage0 to re-run. At the same time, task X in stage2 is still running. Since there's multiple tasks from stage0 are running at the same time and each time a task from stage0 finished will invalidate cached map status(destroy broadcast), thus, task X has high possibility to hit IOException(a.k.a `Failed to get broadcast`) after fetching broadcasted map status from driver(because tasks from stage0 are continuously destroying the broadcast at the same time).
   > 
   > Also, in `TaskSetManager` side, it treats the exception as a counted task failure(rather than FetchFailed) and retry the task and then hit the same exception again and again.
   
   That's it! Thanks for reviewing.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org