You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/12/03 14:36:15 UTC

[GitHub] [spark] tgravescs commented on issue #26614: [SPARK-29976][CORE] Trigger speculation for stages with too few tasks

tgravescs commented on issue #26614: [SPARK-29976][CORE] Trigger speculation for stages with too few tasks
URL: https://github.com/apache/spark/pull/26614#issuecomment-561195206
 
 
   Right, I get that it still uses regular speculation logic if enough have finished, my concerns are confusion to the user or it kicking in when users don't want it to.  
   
   Lets say I set both speculation policies because I have different stages with different requirements. I have one stage with 1 task that is problematic, the speculationTaskDurationThresOpt will also be applied to all my other stages that I configured the normal spark speculation configs for.  If the  speculationTaskDurationThresOpt is something that could is widely different for different stages then its harder to configure this way and can kick in when I don't want it to or when I don't expect it to.  the normal speculation configs are based on a multiplier of other task time, this is a just a hardcoded timeout. so lets say my normal speculation config multiplier would kick in only after an hour and my speculationTaskDurationThresOpt is set to 15 minutes. I'm going to start speculating a lot more when the unfinished gets below that threshold.
   
   I totally get that this perhaps covers more scenarios which in my opinion is good and bad as shown above. I was thinking keeping this simple for now and just having it apply if total tasks <= slots on 1 executor. That should be very easy for user to understand and know when it will apply.  It solves the issue reported in this jira. If we start to find more specific cases we want to get smarter then we can enhance it later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org