You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/20 09:34:39 UTC

[GitHub] [spark] liupc edited a comment on issue #27871: [SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode

liupc edited a comment on issue #27871: [SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode
URL: https://github.com/apache/spark/pull/27871#issuecomment-601598502
 
 
   > For small queries, usually they won't hit this problem. For big queries, the query latency shouldn't matter too much?
   > 
   > @liupc have you tried this in real-world workloads?
   
   Yes, we tried in real workloads, it does better especially there are lots of taskSets to be scheduled for one round scheduling. This is obvious for adaptive execution. Also, I think this is what FIFO should do.
   usually queries may mapping to several jobs, if several jobs being delayed due to this reason, the total delay is obvious. Suppose each job duration would be 2 min, then if there are 10 jobs in front of the job and the cores is fully used. then due to this reason, it wait 20min to be scheduled.What's worse, in adaptive exeuction, when next batch of jobs being submitted, it may met this issue again, that may greatly harm the query duration.
   Also, users will see lots of jobs running for later comming queries in SparkUI, that's confusing. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org