You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/12/19 03:19:19 UTC

[GitHub] [spark] seayoun opened a new pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp

seayoun opened a new pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp
URL: https://github.com/apache/spark/pull/26938
 
 
   ### **What changes were proposed in this pull request?**
   
   **Backgroud**
   The driver can't sense this executor was lost through the network connection disconnection If an executor was lost in the network and it have not responsed rst and close packet to driver, so driver can only sense this executor dead through heartbeat expired.
   
   **Problems**
   Heartbeat expiration processing flow as follows:
   1. Executor heartbeat expired as above.
   2. HeartbeatReceiver will call scheduler executor lost to rescheduler the tasks on this executor.
   3. HeartbeatReceiver kill the executor.
   
   The tasks on the dead executor have a chance to rescheduled on this dead executor again if the task rescheduler before the executor has't remove from executorBackend, it will send launch task to this executor again, the executor will not response and the driver can't sense through heartbeat beause the executor has lost in network. This cause those tasks rescheduled on this lost executor can't finish forever, and the app will hung up here forever.
   This patch fix this problem, it remove the executor before rescheduler.
   
   ### **Why are the changes needed?**
   This will cause app hung up.
   
   ### **Does this PR introduce any user-facing change?**
   NO
   
   ### **How was this patch tested?**

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org