You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/05 04:19:43 UTC

[GitHub] [spark] kazuyukitanimura opened a new pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

kazuyukitanimura opened a new pull request #35400:
URL: https://github.com/apache/spark/pull/35400


   ### What changes were proposed in this pull request?
   This is a follow-up PR to fix the bug introduced by SPARK-36665. With this fix, `NotPropagation` optimization does not apply to `InSubquery` cases.
   
   
   ### Why are the changes needed?
   `NotPropagation` optimization previously broke `RewritePredicateSubquery` so that it does not properly rewrite the predicate to a NULL-aware left anti join anymore. 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit test added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #35400:
URL: https://github.com/apache/spark/pull/35400#issuecomment-1031842828


   Thanks @cloud-fan My goal here is to unblock #35395. As you mentioned your concern about the complexity of this logic, I opened #35428 to remove `NotPropagation` for now.
   
   For the effectiveness, I put many examples at https://github.com/apache/spark/blob/977dd054ed0946b62e62d2d480dbf25598545a5e/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NotPropagationSuite.scala#L65-L168 that helps simplifying users' queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura closed pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura closed pull request #35400:
URL: https://github.com/apache/spark/pull/35400


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #35400:
URL: https://github.com/apache/spark/pull/35400#discussion_r799941038



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NotPropagationSuite.scala
##########
@@ -173,4 +173,53 @@ class NotPropagationSuite extends PlanTest with ExpressionEvalHelper {
     checkCondition(('a === 'b) =!= ('a === 'c), ('a === 'b) =!= ('a === 'c))
     checkCondition(('a === 'b) =!= ('c in(1, 2, 3)), ('a === 'b) =!= ('c in(1, 2, 3)))
   }
+
+  test("[SPARK-36665] Do not simplify Not(InSubquery)") {

Review comment:
       nit: usual style is `SPARK-36665: ...`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35400:
URL: https://github.com/apache/spark/pull/35400#issuecomment-1030994629


   cc @allisonwang-db FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #35400:
URL: https://github.com/apache/spark/pull/35400#issuecomment-1031942350


   Closing by preferring https://github.com/apache/spark/pull/35428


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #35400:
URL: https://github.com/apache/spark/pull/35400#issuecomment-1030519708


   @aokolnychyi Thank you for finding the issue. cc @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #35400:
URL: https://github.com/apache/spark/pull/35400#issuecomment-1030521864


   cc @cloud-fan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35400: [SPARK-36665][SQL][FOLLOWUP] Avoid Optimizing Not(InSubquery)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35400:
URL: https://github.com/apache/spark/pull/35400#issuecomment-1031575626


   Are there any real-world examples to demonstrate the effectiveness of the rule `NotPropagation`? I'm a bit worried about making this rule more complicated while this rule has nearly no visible benefit...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org