You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/11 07:34:45 UTC

[GitHub] [spark] weixiuli edited a comment on pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

weixiuli edited a comment on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035945019

> This reverts [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), right?

This does not revert [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414)，it keeps the disable timeout for broadcast stages that is converted from shuffle in AQE.
> One idea is to make the broadcast itself dynamic: it should cancel the job if it has already collected much data at the driver side.

This is a good idea, in fact, JD production has done that by checkinng whether broadcast Stages have tasks running In non-AQE. If a broadcast stage timeout with no one task running, means that it is not scheduled and should retry wait(we use spark.sql.broadcastMaxRetries to do that), if a broadcast stage timeout with some tasks running, we should cancel the job.

What do you think about the mechanism above to use for AQE broadcast stages(not converted from shuffle)? @cloud-fan

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org