You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/23 21:06:00 UTC

[GitHub] [spark] squito commented on a change in pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

squito commented on a change in pull request #28848:
URL: https://github.com/apache/spark/pull/28848#discussion_r444508358



##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -177,6 +177,8 @@ private[spark] class DAGScheduler(
   // TODO: Garbage collect information about failure epochs when we know there are no more
   //       stray messages to detect.
   private val failedEpoch = new HashMap[String, Long]
+  // In addition, track epoch for failed executors that result in lost file output

Review comment:
       yeah I really agree with this.  I think we need to rename both of these epochs.  I think I finally get why they both matter. One tracks files owned by the external shuffle service -- so you don't update those files when you learn that an executor is lost.  The other tracks files owned by the executor itself.  So that includes memory-cached blocks always, and also shuffle files w/out the external shuffle service.
   
   Once I wrapped my head around that part, the change in the logic made sense.
   
   I will think some more about better names and comments which would help with that part




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org