You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/16 14:09:29 UTC

[GitHub] [spark] pgandhi999 commented on a change in pull request #24375: [SPARK-27474][CORE] avoid retrying a task failed with CommitDeniedException many times

pgandhi999 commented on a change in pull request #24375: [SPARK-27474][CORE] avoid retrying a task failed with CommitDeniedException many times
URL: https://github.com/apache/spark/pull/24375#discussion_r275817195
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##########
 @@ -870,22 +874,21 @@ private[spark] class TaskSchedulerImpl(
   }
 
   /**
-   * Marks the task has completed in all TaskSetManagers for the given stage.
+   * Marks the task has completed in the active TaskSetManager for the given stage.
    *
    * After stage failure and retry, there may be multiple TaskSetManagers for the stage.
-   * If an earlier attempt of a stage completes a task, we should ensure that the later attempts
-   * do not also submit those same tasks.  That also means that a task completion from an earlier
-   * attempt can lead to the entire stage getting marked as successful.
+   * If an earlier zombie attempt of a stage completes a task, we can ask the later active attempt
+   * to skip submitting and running the task for the same partition, to save resource. That also
+   * means that a task completion from an earlier zombie attempt can lead to the entire stage
+   * getting marked as successful.
    */
-  private[scheduler] def markPartitionCompletedInAllTaskSets(
+  private[scheduler] def markPartitionCompleted(
 
 Review comment:
   Just fyi, after adding the synchronized statement, this PR is almost  similar to PR #22806 before it got restructured. I could have simply added synchronized to `markPartitionCompletedInAllTaskSets()` instead of restructuring the PR but did not do so because of the following suggestion by @squito which I agree with.
   
   > I'm not sure how to fix this. You could make TaskSchedulerImpl.markPartitionCompletedInAllTaskSets() synchronized, but then you're getting a lock on the taskSchedulerImpl in the DAGScheduler event loop. That's not good for scheduling throughput, and also want to make sure there is no change of deadlock.
   
   So I would like to once again pose the same question as above: is the scheduling throughput getting impacted by adding the extra synchronization?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org