You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "erenavsarogullari (via GitHub)" <gi...@apache.org> on 2024/02/29 04:16:58 UTC

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE QueryStages on the cancellation [spark]

erenavsarogullari commented on code in PR #45234:
URL: https://github.com/apache/spark/pull/45234#discussion_r1506968570


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala:
##########
@@ -250,11 +281,15 @@ case class BroadcastQueryStageExec(
     reuse
   }
 
-  override def cancel(): Unit = {
+  override def cancel(): StageCancellationStatus = {
     if (!broadcast.relationFuture.isDone) {

Review Comment:
   `BroadcastExchangeExec.relationFuture` returns Java Future but ShuffleExchangeExec.mapOutputStatisticsFuture returns Scala Future. However, i reproduced today as `BroadcastQueryStageExec` can also have same problem although it has this check(`!broadcast.relationFuture.isDone`). In terms of these, i fixed `BroadcastQueryStage` case as well so please see my second commit. Thanks.
   
   **Reproduce Steps:** 
   We can also have following case:
   **Before fix:**
   ```
   BroadcastQueryStage1.doMaterialize is called and it is failed,
   BroadcastQueryStage1 is excluded from cancellation due to earlyFailedStage,
   BroadcastQueryStage2 is tried to cancel,
   BroadcastQueryStage2.relationFuture needs to be initialized because it is not materialized yet,
   BroadcastQueryStage2.relationFuture cancellation is finished with COMPLETED status,
   Same for other BroadcastQueryStages which are not materialized yet.
   ```
   **After fix (my second commit):**
   ```
   BroadcastQueryStage1.doMaterialize is called and it is failed,
   BroadcastQueryStage1 is excluded from cancellation due to earlyFailedStage,
   BroadcastQueryStage2 is tried to cancel,
   BroadcastQueryStage2.relationFuture cancellation is skipped with SKIPPED status because it is not materialized yet,
   Same for other BroadcastQueryStages which are not materialized yet.
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org