You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/23 03:47:50 UTC

[GitHub] [spark] ajithme opened a new pull request #24438: [SPARK-23626][CORE] DAGScheduler blocked due to JobSubmitted event

ajithme opened a new pull request #24438: [SPARK-23626][CORE] DAGScheduler blocked due to JobSubmitted event
URL: https://github.com/apache/spark/pull/24438
 
 
   ## What changes were proposed in this pull request?
   
   DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted events has to be processed as DAGSchedulerEventProcessLoop is single threaded and it will block other tasks in queue like TaskCompletion.
   The JobSubmitted event is time consuming depending on the nature of the job (Example: calculating parent stage dependencies, shuffle dependencies, partitions) and thus it blocks all the events to be processed.
   
   Similarly in my cluster some jobs partition calculation is time consuming (Similar to stack at SPARK-2647) hence it slows down the spark DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if its tasks are finished within seconds, as TaskCompletion Events are processed at a slower rate due to blockage.
   
   Move the ResultStage creation to call site thread, which will avoid blocking of DAGScheduler thread for other events
   
   Refer: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Scheduler-Spark-DAGScheduler-scheduling-performance-hindered-on-JobSubmitted-Event-td23562.html
   
   ## How was this patch tested?
   
   1) Added UT  
   2) Manual test to verify blockage before and after applying patch.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org