You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/11 16:22:00 UTC

[GitHub] [spark] xuanyuanking opened a new pull request #24350: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

xuanyuanking opened a new pull request #24350: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/24350
 
 
   ## What changes were proposed in this pull request?
   
   When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost executors from CoarseGrainedSchedulerBackend. When a connection of an executor is not gracefully shut down, CoarseGrainedSchedulerBackend may not receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask TaskScheduler to run tasks on this lost executor. This task will never finish and the job will hang forever.
   
   ## How was this patch tested?
   
   Add UT in HeartbeatReceiverSuite.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org