You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/06 17:10:04 UTC

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31113: [SPARK-34061][SQL] DISTINCT the INTERSECT children

dongjoon-hyun commented on a change in pull request #31113:
URL: https://github.com/apache/spark/pull/31113#discussion_r588905792



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1663,8 +1663,12 @@ object ReplaceDeduplicateWithAggregate extends Rule[LogicalPlan] {
 /**
  * Replaces logical [[Intersect]] operator with a left-semi [[Join]] operator.
  * {{{
- *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
- *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
+ *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2 ==>
+ *   SELECT a1, a2 FROM
+ *     (SELECT DISTINCT a1, a2 FROM Tab1)
+ *   LEFT SEMI JOIN
+ *     (SELECT DISTINCT b1, b2 FROM Tab2)

Review comment:
       Do you have any reference from the other DBMSs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org