You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/07 11:32:19 UTC

[GitHub] [spark] weixiuli opened a new pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

weixiuli opened a new pull request #35425:
URL: https://github.com/apache/spark/pull/35425


   
   ### What changes were proposed in this pull request?
   In the [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), it disabled timeout for all BroadcastQueryStageExecs in AQE,  there may be regression in AQE when a BroadcastQueryStageExec doesn't come from shuffle query stages, such as a big table misjudged a small one in broadcastHashJoin,and broadcasting the table may take a long time which is no better than SortMergerJoin(when the job timeout and rerun it with spark.sql.autoBroadcastJoinThreshold=-1), and the timeout is necessary for BroadcastQueryStageExec in this case.
   
   So, we should disable timeout for BroadcastQueryStageExec when it comes from shuffle query stages which runtime statistics are usually correct in AQE, but should enable timeout for it when it comes from others which statistics may be incorrect, and keep it the same as non-AQE.
   
   ### Why are the changes needed?
   
   Avoid regression in AQE 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Add unittests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035975940


   I don't think AQE makes a big difference here. The normal broadcast can also hit this problem if there are many concurrent queries. The broadcast timeout is not a proper design at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035950763


   > it keeps the disable timeout for broadcast stages that is converted from shuffle in AQE.
   
   This is not sufficient. The timeout can never be accurate because the scheduler is a black box to the SQL engine. We don't know what happens there and are not sure we should keep waiting or give up.
   
   The same problem applies to your proposal as well. Maybe the broadcast job has been waiting for 10 mins and submits the first task, then hits timeout and is killed.
   
   I think the broadcast job should take care of the data size, not time. The scheduler is a black box and time doesn't tell much. But data size is accurate. The broadcast job knows when one task completes and gets the result of this task. If the collected data size is too large, we can fail this broadcast job (or turn it back to shuffle join?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] weixiuli edited a comment on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
weixiuli edited a comment on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035958085


   @cloud-fan  I agree with you, BTW disable timeout for all broadcast stages in AQE is not reasonable, right?  should we disable timeout for the broadcast stage  that is converted from shuffle in AQE firstly, and to solve the broadcast stage that is not converted from shuffle  secondly ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] weixiuli commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
weixiuli commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035958085


   @cloud-fan  I agree with you, BTW disable timeout for all broadcast stages in AQE is not reasonable, right?  should we disable timeout for broadcast stages  that is converted from shuffle in AQE firstly, and to solve the broadcast stages that is not converted from shuffle  secondly ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1034427438


   This reverts [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), right? If a query has many broadcast stages (not converted from shuffle), and the broadcast may be waiting to be scheduled for a long time, having a timeout will break valid queries.
   
   Unfortunately, we don't have a good solution for it today. One idea is to make the broadcast itself dynamic: it should cancel the job if it has already collected much data at the driver side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] weixiuli edited a comment on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
weixiuli edited a comment on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035945019


   
   
   
   > This reverts [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), right? 
   
   This does not revert  [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414),it keeps the disable timeout for broadcast stages that is converted from  shuffle  in AQE.
   > One idea is to make the broadcast itself dynamic: it should cancel the job if it has already collected much data at the driver side.
   
   This is a good idea, in fact, JD production has done that by checkinng whether broadcast Stages have tasks running In non-AQE.  If a broadcast stage timeout with no one task running, means that it is not scheduled and should retry wait(we use spark.sql.broadcastMaxRetries to do that), if a broadcast stage timeout with some tasks running, we should cancel the job.
   
   What do you think about the mechanism above to use for AQE  broadcast stages(not converted from shuffle)?  @cloud-fan  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] weixiuli commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
weixiuli commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035945019


   
   
   
   
   > This reverts [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), right? 
   
   This does not revert  [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414),it keep disable timeout for broadcast stages that is converted from  shuffle  in AQE.
   > One idea is to make the broadcast itself dynamic: it should cancel the job if it has already collected much data at the driver side.
   
   This is a good idea, in fact, JD production has done that by checkinng whether broadcast Stages have tasks running In non-AQE.  If a broadcast stage timeout with no one task running, means that it is not scheduled and should retry wait(we use spark.sql.broadcastMaxRetries to do that), if a broadcast stage timeout with some tasks running, we should cancel the job.
   
   What do you think about the mechanism above to use for AQE  broadcast stages(not converted from shuffle)?  @cloud-fan  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1031618357


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] weixiuli commented on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Posted by GitBox <gi...@apache.org>.
weixiuli commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1033316359


   ping @yaooqinn   @cloud-fan Could you help me to take a look when you have time? thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org