You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "mridulm (via GitHub)" <gi...@apache.org> on 2023/07/01 03:31:28 UTC

[GitHub] [spark] mridulm commented on pull request #41723: [SPARK-44179][CORE]Fix the number of executors is calculated incorrctly when the task fails and it is speculated that the task is still executing

mridulm commented on PR #41723:
URL: https://github.com/apache/spark/pull/41723#issuecomment-1615415367

   The issue here is in `TaskSetManager` (tsm), not `ExecutorAllocationManager`.
   The scenario you laid out is correct, namely:
   
   Initial state:
   * For partition P, original task T1 = 1.0  and speculative task T2 = 1.1 
   * `speculatableTasks` is empty, since 1.1 has started already.
   
   T1 fails [1]:
   * tsm.`handleFailedTask` adds the task again (`addPendingTask` at the end).
   * This in turn adds P to all the task lists 
   
   
   Assuming [1] has not yet been scheduled and we have `checkSpeculatableTasks` invoked.
   The condition in `checkAndSubmitSpeculatableTasks` will be true, namely : `!successful(index) && copiesRunning(index) == 1 && !speculatableTasks.contains(index)`.
   Hence we add a speculative task for schedule.
   
   This is the reason why we have 3 pending tasks - the running speculated task, the re-added task from failure, and the new speculative task.
   So what `ExecutorAllocationManager` is doing is IMO correct - what is incorrect is addition of the second speculative task itself.
   
   The fix would be to change the condition in `checkAndSubmitSpeculatableTasks` to something like:
   
   
   ```
     !successful(index) && copiesRunning(index) == 1 && !speculatableTasks.contains(index) && 
     // make sure the currently running copy is not a speculative task
     !info.speculative
   ```
   
   +CC @Ngone51 as well, since this is a bit more involved change (though a 1 line change :-) )
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org