You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "bersprockets (via GitHub)" <gi...@apache.org> on 2024/03/28 23:28:09 UTC

[PR] [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization [spark]

bersprockets opened a new pull request, #45763:
URL: https://github.com/apache/spark/pull/45763

   ### What changes were proposed in this pull request?
   
   Modify `LateralJoin` to include right-side plan output in `allAttributes`.
   
   ### Why are the changes needed?
   
   In the following example, the view v1 is cached, but a query of v1 does not use the cache:
   ```
   CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
   CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
   
   create or replace temp view v1 as
   select *
   from t1
   join lateral (
     select c1 as a, c2 as b
     from t2)
   on c1 = a;
   
   cache table v1;
   
   explain select * from v1;
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
      :- LocalTableScan [c1#180, c2#181]
      +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [plan_id=113]
         +- LocalTableScan [a#173, b#174]
   ```
   
   The canonicalized version of the `LateralJoin` node is not consistent when there is a join condition. For example, for the above query, the join condition is canonicalized as follows:
   ```
   Before canonicalization: Some((c1#174 = a#167))
   After canonicalization:  Some((none#0 = none#167))
   ```
   You can see that the `exprId` for the second operand of `EqualTo` is not normalized (it remains 167). That's because the attribute `a` from the right-side plan is not included `allAttributes`.
   
   This PR adds right-side attributes to `allAttributes` so that references to right-side attributes in the join condition are normalized during canonicalization.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New test.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #45763: [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization
URL: https://github.com/apache/spark/pull/45763


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45763:
URL: https://github.com/apache/spark/pull/45763#discussion_r1574118395


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##########
@@ -2056,6 +2056,8 @@ case class LateralJoin(
     joinType: JoinType,
     condition: Option[Expression]) extends UnaryNode {
 
+  override lazy val allAttributes: AttributeSeq = children.flatMap(_.output) ++ right.plan.output

Review Comment:
   It looks clearer to do `left.output ++ right.plan.output`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #45763:
URL: https://github.com/apache/spark/pull/45763#issuecomment-2071478863

   thanks, merging to master/3.5/3.4!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #45763:
URL: https://github.com/apache/spark/pull/45763#issuecomment-2026517207

   cc @allisonwang-db @maryannxue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #45763:
URL: https://github.com/apache/spark/pull/45763#issuecomment-2071482783

   It has conflicts. @bersprockets can you help to create a 3.5 backport PR? thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org