You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/13 10:24:23 UTC

[GitHub] [spark] ulysses-you commented on pull request #36845: [SPARK-39447][SQL] Only non-broadcast query stage can propagate empty relation

ulysses-you commented on PR #36845:
URL: https://github.com/apache/spark/pull/36845#issuecomment-1153742980

   I can re-produce it by: 
   ```sql
   CREATE TABLE t1(c1 int) USING PARQUET PARTITIONED BY (p1 string);
   CREATE TABLE t2(c2 int) USING PARQUET PARTITIONED BY (p2 string);
   
   SELECT * from (
   SELECT /*+ merge(t1) */ p1 FROM t1 JOIN t2 ON c1 = c2
   ) x JOIN t2 ON p1 = p2
   WHERE
   c2 > 0
   ```
   
   The reason is, AQE + DPP will insert a broadcast exchange at the top of `AdaptiveSparkPlanExec` when it is broadcast reusable. There exists some hacky code for this behavior during AQE `re-optimize`:
   
   ```scala
   // When both enabling AQE and DPP, `PlanAdaptiveDynamicPruningFilters` rule will
   // add the `BroadcastExchangeExec` node manually in the DPP subquery,
   // not through `EnsureRequirements` rule. Therefore, when the DPP subquery is complicated
   // and need to be re-optimized, AQE also need to manually insert the `BroadcastExchangeExec`
   // node to prevent the loss of the `BroadcastExchangeExec` node in DPP subquery.
   // Here, we also need to avoid to insert the `BroadcastExchangeExec` node when the newPlan
   // is already the `BroadcastExchangeExec` plan after apply the `LogicalQueryStageStrategy` rule.
   val finalPlan = currentPhysicalPlan match {
     case b: BroadcastExchangeLike
       if (!newPlan.isInstanceOf[BroadcastExchangeLike]) => b.withNewChildren(Seq(newPlan))
     case _ => newPlan
   }
   ```
   
   However, this code does not match if the top level broadcast exchange is wrapped by query stage. This case will happen if the broadcast exchange which is added by DPP is running before than the normal broadcast exchange(e.g. introduced by join).
   
   So we can match `BroadcastQueryStage(_, ReusedExchangeExec, _)` and skip the optimization. It is no meaning to optimize a child inside a reused exchange which is only for broadcast.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org