You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/08 07:05:45 UTC

[GitHub] [spark] AngersZhuuuu opened a new pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

AngersZhuuuu opened a new pull request #29035:
URL: https://github.com/apache/spark/pull/29035


   ### What changes were proposed in this pull request?
   In current Join Hint strategies, if we use SHUFFLE_REPLICATE_NL hint, it will directly convert join to Cartesian Product Join and loss join condition making result not correct.
   
   For Example:
   ```
   spark-sql> select * from test4 order by a asc;
   1 2
   Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s)
   spark-sql>select * from test5 order by a asc
   1 2
   2 2
   Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar
   spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 where test4.a = test5.a order by test4.a asc ;
   1 2 1 2
   1 2 2 2
   Time taken: 0.351 seconds, Fetched 2 row(s)
   20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched 2 row(s)
   ```
   
   
   ### Why are the changes needed?
   Fix wrong data result
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   Added UT


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656525798






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656931219


   > Hi, @AngersZhuuuu . Could you make a backport PR against `branch-3.0`? There is a conflict.
   
   Yea


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452587223



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
##########
@@ -43,6 +43,22 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
       case h: ResolvedHint =>
         hintErrorHandler.joinNotFoundForJoinHint(h.hints)
         h.child
+      case join: Join if join.condition.isDefined =>
+        join.copy(
+          hint = join.hint.copy(
+            leftHint = removeCartesianProductJoinHint(join.hint.leftHint),
+            rightHint = removeCartesianProductJoinHint(join.hint.rightHint)))
+    }
+  }
+
+  def removeCartesianProductJoinHint(hint: Option[HintInfo]): Option[HintInfo] = {

Review comment:
       @maropu See latest change, it's ok to do like this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656733347






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655816702


   **[Test build #125421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125421/testReport)** for PR 29035 at commit [`aeccc25`](https://github.com/apache/spark/commit/aeccc257fe55af23dd63f77630cc0ca0991bd7a4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655816927






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r453468999



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -199,7 +199,7 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
 
         def createCartesianProduct() = {
           if (joinType.isInstanceOf[InnerLike]) {
-            Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), condition)))
+            Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), p.condition)))

Review comment:
       We need to write a comment above this line and explain what it is doing. 
   // instead of using the condition extracted by ExtractEquiJoinKeys, we should use the original join condition, i.e., "p.condition". 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656455759






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452547628



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
##########
@@ -570,4 +596,22 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with AdaptiveSparkP
       assert(joinHints == expectedHints)
     }
   }
+
+  test("SPARK-32220: Non Cartesian Product Join Result Correct with SHUFFLE_REPLICATE_NL hint") {

Review comment:
       Yea, I think so. Nice catch, @AngersZhuuuu 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655929564






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656455790


   **[Test build #125545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125545/testReport)** for PR 29035 at commit [`d9b4dcd`](https://github.com/apache/spark/commit/d9b4dcd5664d27ef50606ae30abdeeac7356e25e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452597570



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
##########
@@ -43,6 +43,22 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
       case h: ResolvedHint =>
         hintErrorHandler.joinNotFoundForJoinHint(h.hints)
         h.child
+      case join: Join if join.condition.isDefined =>
+        join.copy(
+          hint = join.hint.copy(
+            leftHint = removeCartesianProductJoinHint(join.hint.leftHint),
+            rightHint = removeCartesianProductJoinHint(join.hint.rightHint)))
+    }
+  }
+
+  def removeCartesianProductJoinHint(hint: Option[HintInfo]): Option[HintInfo] = {

Review comment:
       > You cannot use the original condition in logical.Join?
   > 
   > ```
   > case p @ ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right, hint) =>
   >   p.condition <-- This?
   > ```
   
   Don't know we can write like this....updated..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656380630


   Thank you for reporting and making a PR, @AngersZhuuuu .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656522451






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656525047


   **[Test build #125571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125571/testReport)** for PR 29035 at commit [`d9b4dcd`](https://github.com/apache/spark/commit/d9b4dcd5664d27ef50606ae30abdeeac7356e25e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656733347


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656936368


   late LGTM. Thanks, @AngersZhuuuu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656445001


   **[Test build #125540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125540/testReport)** for PR 29035 at commit [`a65d332`](https://github.com/apache/spark/commit/a65d3320fc920724d8bcf0213dc718ae336e521a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655816702


   **[Test build #125421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125421/testReport)** for PR 29035 at commit [`aeccc25`](https://github.com/apache/spark/commit/aeccc257fe55af23dd63f77630cc0ca0991bd7a4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452548621



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
##########
@@ -43,6 +43,22 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
       case h: ResolvedHint =>
         hintErrorHandler.joinNotFoundForJoinHint(h.hints)
         h.child
+      case join: Join if join.condition.isDefined =>
+        join.copy(
+          hint = join.hint.copy(
+            leftHint = removeCartesianProductJoinHint(join.hint.leftHint),
+            rightHint = removeCartesianProductJoinHint(join.hint.rightHint)))
+    }
+  }
+
+  def removeCartesianProductJoinHint(hint: Option[HintInfo]): Option[HintInfo] = {

Review comment:
       I think we should pass a correct condition (`leftKeys` and `rightKeys`) into `CartesianProductExec` instead of removing the hint: 
   https://github.com/apache/spark/blob/ac6406e7571027ba61193452d7a54f7895d2cbdc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L202




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655444023


   **[Test build #125314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125314/testReport)** for PR 29035 at commit [`27b41e5`](https://github.com/apache/spark/commit/27b41e55b6f7bcca812e34142e6598f468b14ced).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656934011


   Thanks, @AngersZhuuuu .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656522451






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655445160






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656733359


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125571/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655334589






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655445168


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125314/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656754532


   Hi, @AngersZhuuuu . Could you make a backport PR against `branch-3.0`? There is a conflict.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656455790






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656730829


   **[Test build #125571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125571/testReport)** for PR 29035 at commit [`d9b4dcd`](https://github.com/apache/spark/commit/d9b4dcd5664d27ef50606ae30abdeeac7356e25e).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656522510


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125545/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452576320



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
##########
@@ -570,4 +596,22 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with AdaptiveSparkP
       assert(joinHints == expectedHints)
     }
   }
+
+  test("SPARK-32220: Non Cartesian Product Join Result Correct with SHUFFLE_REPLICATE_NL hint") {

Review comment:
       Yea,  when I try new join hint, I found this result is non-correct. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656380902


   cc @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655928601


   **[Test build #125421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125421/testReport)** for PR 29035 at commit [`aeccc25`](https://github.com/apache/spark/commit/aeccc257fe55af23dd63f77630cc0ca0991bd7a4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656525047


   **[Test build #125571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125571/testReport)** for PR 29035 at commit [`d9b4dcd`](https://github.com/apache/spark/commit/d9b4dcd5664d27ef50606ae30abdeeac7356e25e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452592874



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
##########
@@ -43,6 +43,22 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
       case h: ResolvedHint =>
         hintErrorHandler.joinNotFoundForJoinHint(h.hints)
         h.child
+      case join: Join if join.condition.isDefined =>
+        join.copy(
+          hint = join.hint.copy(
+            leftHint = removeCartesianProductJoinHint(join.hint.leftHint),
+            rightHint = removeCartesianProductJoinHint(join.hint.rightHint)))
+    }
+  }
+
+  def removeCartesianProductJoinHint(hint: Option[HintInfo]): Option[HintInfo] = {

Review comment:
       You cannot use the original condition in logical.Join?
   ```
   case p @ ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right, hint) =>
     p.condition <-- This?
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656520179






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r453469535



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -159,7 +159,7 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
       //   4. Pick cartesian product if join type is inner like.
       //   5. Pick broadcast nested loop join as the final solution. It may OOM but we don't have
       //      other choice.
-      case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right, hint) =>
+      case p @ ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right, hint) =>

Review comment:
       To avoid making the similar mistakes, we need to rename `condition` to a self-descriptive name. "otherConditions"? It is a little bit hard to name it TBH
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r453473254



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -199,7 +199,7 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
 
         def createCartesianProduct() = {
           if (joinType.isInstanceOf[InnerLike]) {
-            Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), condition)))
+            Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), p.condition)))

Review comment:
       > We need to write a comment above this line and explain what it is doing.
   > // instead of using the condition extracted by ExtractEquiJoinKeys, we should use the original join condition, i.e., "p.condition".
   
   Raise a PR  https://github.com/apache/spark/pull/29084




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452576902



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
##########
@@ -43,6 +43,22 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
       case h: ResolvedHint =>
         hintErrorHandler.joinNotFoundForJoinHint(h.hints)
         h.child
+      case join: Join if join.condition.isDefined =>
+        join.copy(
+          hint = join.hint.copy(
+            leftHint = removeCartesianProductJoinHint(join.hint.leftHint),
+            rightHint = removeCartesianProductJoinHint(join.hint.rightHint)))
+    }
+  }
+
+  def removeCartesianProductJoinHint(hint: Option[HintInfo]): Option[HintInfo] = {

Review comment:
       > I think we should pass a correct condition (`leftKeys` and `rightKeys`) into `CartesianProductExec` instead of removing the hint:
   > https://github.com/apache/spark/blob/ac6406e7571027ba61193452d7a54f7895d2cbdc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L202
   
   Yea, in default Cartesian Product Join situation,  it didn't need condition at all.  So in default, it seems don't have condition when build data.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656523883


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655333845


   **[Test build #125314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125314/testReport)** for PR 29035 at commit [`27b41e5`](https://github.com/apache/spark/commit/27b41e55b6f7bcca812e34142e6598f468b14ced).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656445261






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452663993



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##########
@@ -199,7 +199,7 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
 
         def createCartesianProduct() = {
           if (joinType.isInstanceOf[InnerLike]) {
-            Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), condition)))
+            Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), p.condition)))

Review comment:
       good catch!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656525798






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452549562



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
##########
@@ -43,6 +43,22 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
       case h: ResolvedHint =>
         hintErrorHandler.joinNotFoundForJoinHint(h.hints)
         h.child
+      case join: Join if join.condition.isDefined =>
+        join.copy(
+          hint = join.hint.copy(
+            leftHint = removeCartesianProductJoinHint(join.hint.leftHint),
+            rightHint = removeCartesianProductJoinHint(join.hint.rightHint)))
+    }
+  }
+
+  def removeCartesianProductJoinHint(hint: Option[HintInfo]): Option[HintInfo] = {

Review comment:
       It seems the spark strategy incrrectly removes `a#0 = a#2`; 
   ```
   == Optimized Logical Plan ==
   Sort [a#0 ASC NULLS FIRST], true
   +- Join Inner, (a#0 = a#2), leftHint=(strategy=shuffle_replicate_nl)
      :- Filter isnotnull(a#0)
      :  +- Relation[a#0,b#1] parquet
      +- Filter isnotnull(a#2)
         +- Relation[a#2,b#3] parquet
   
   == Physical Plan ==
   *(3) Sort [a#0 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(a#0 ASC NULLS FIRST, 200), true, [id=#87]
      +- CartesianProduct
         :- *(1) Project [a#0, b#1]
         :  +- *(1) Filter isnotnull(a#0)
         :     +- *(1) ColumnarToRow
         :        +- FileScan parquet default.test4[a#0,b#1] Batched: true, DataFilters: ...
         +- *(2) Project [a#2, b#3]
            +- *(2) Filter isnotnull(a#2)
               +- *(2) ColumnarToRow
                  +- FileScan parquet default.test5[a#2,b#3] Batched: true, DataFilters: ...
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655445160


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655929564






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655333845


   **[Test build #125314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125314/testReport)** for PR 29035 at commit [`27b41e5`](https://github.com/apache/spark/commit/27b41e55b6f7bcca812e34142e6598f468b14ced).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655816927






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452587775



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
##########
@@ -43,6 +43,22 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
       case h: ResolvedHint =>
         hintErrorHandler.joinNotFoundForJoinHint(h.hints)
         h.child
+      case join: Join if join.condition.isDefined =>
+        join.copy(
+          hint = join.hint.copy(
+            leftHint = removeCartesianProductJoinHint(join.hint.leftHint),
+            rightHint = removeCartesianProductJoinHint(join.hint.rightHint)))
+    }
+  }
+
+  def removeCartesianProductJoinHint(hint: Option[HintInfo]): Option[HintInfo] = {

Review comment:
       Since we don't know the join keys's condition is `EqualTo` or `EaultNullSafe` so it's better just not remove it in `ExtractEqualJoinKeys`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656445261






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r452523049



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
##########
@@ -570,4 +596,22 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with AdaptiveSparkP
       assert(joinHints == expectedHints)
     }
   }
+
+  test("SPARK-32220: Non Cartesian Product Join Result Correct with SHUFFLE_REPLICATE_NL hint") {

Review comment:
       So, is this a correctness issue, @AngersZhuuuu ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #29035:
URL: https://github.com/apache/spark/pull/29035


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join redsult

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-655334589






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29035:
URL: https://github.com/apache/spark/pull/29035#issuecomment-656455759






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org