You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/07 19:33:30 UTC

[GitHub] [spark] kazuyukitanimura opened a new pull request #35428: [SPARK-36665][SQL][FOLLOWUP] Remove NotPropagation

kazuyukitanimura opened a new pull request #35428:
URL: https://github.com/apache/spark/pull/35428


   ### What changes were proposed in this pull request?
   This is a follow-up PR to mitigate the bug introduced by SPARK-36665. This PR removes `NotPropagation` optimization for now until we find a better approach.
   
   
   ### Why are the changes needed?
   `NotPropagation` optimization previously broke `RewritePredicateSubquery` so that it does not properly rewrite the predicate to a NULL-aware left anti join anymore. 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Existing tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #35428: [SPARK-38132][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #35428:
URL: https://github.com/apache/spark/pull/35428#discussion_r801090682



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##########
@@ -1956,4 +1956,30 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
     assert(!nonDeterministicQueryPlan.deterministic)
   }
 
+  test("SPARK-38132: Avoid optimizing Not IN subquery") {

Review comment:
       Thanks, renamed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #35428: [SPARK-36665][SQL][FOLLOWUP] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #35428:
URL: https://github.com/apache/spark/pull/35428#issuecomment-1031845926


   For unblocking https://github.com/apache/spark/pull/35395


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #35428: [SPARK-36665][SQL][FOLLOWUP] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #35428:
URL: https://github.com/apache/spark/pull/35428#issuecomment-1031844923


   cc @aokolnychyi @viirya @cloud-fan @allisonwang-db @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #35428: [SPARK-38132][SQL] Remove `NotPropagation` rule

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #35428:
URL: https://github.com/apache/spark/pull/35428#issuecomment-1032127847


   > Thanks for pinging me. Just checking for my own understanding - so is this technically a revert of [SPARK-36665](https://issues.apache.org/jira/browse/SPARK-36665)?
   
   @HyukjinKwon It is actually not a full revert. SPARK-36665 introduced two classes(objects), and reverted only one of them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #35428: [SPARK-38132][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #35428:
URL: https://github.com/apache/spark/pull/35428#discussion_r801079743



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/NotInSubqueryEndToEndSuite.scala
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.SharedSparkSession
+
+class NotInSubqueryEndToEndSuite extends QueryTest with SharedSparkSession {

Review comment:
       moved




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35428: [SPARK-36665][SQL][FOLLOWUP] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35428:
URL: https://github.com/apache/spark/pull/35428#issuecomment-1031853252


   cc @sunchao too since he was on the original PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #35428: [SPARK-38132][SPARK-36665][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #35428:
URL: https://github.com/apache/spark/pull/35428#issuecomment-1031905461


   Thank you @dongjoon-hyun @viirya for the feedback. Added [NotInSubqueryEndToEndSuite.scala](https://github.com/apache/spark/pull/35428#diff-5c02392c379b4774eb81e526e2c3716a075ef72917f52b3ace50ae0c35b9a12d)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #35428: [SPARK-38132][SQL] Remove `NotPropagation` rule

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #35428:
URL: https://github.com/apache/spark/pull/35428


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35428: [SPARK-38132][SQL] Remove `NotPropagation` rule

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35428:
URL: https://github.com/apache/spark/pull/35428#issuecomment-1032111119


   Late LGTM. It seems to me that this rule won't make a big deal to the overall performance, so it's better to make it very simple, otherwise it's not worthwhile.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35428: [SPARK-38132][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35428:
URL: https://github.com/apache/spark/pull/35428#discussion_r801087830



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##########
@@ -1956,4 +1956,30 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
     assert(!nonDeterministicQueryPlan.deterministic)
   }
 
+  test("SPARK-38132: Avoid optimizing Not IN subquery") {

Review comment:
       Shall we give a more meaningful name for this test case?
   For me, the test case looks like checking the correctness of the queries which is irrelevant to avoid `optimizing` something.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #35428: [SPARK-38132][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #35428:
URL: https://github.com/apache/spark/pull/35428#discussion_r801075274



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/NotInSubqueryEndToEndSuite.scala
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.SharedSparkSession
+
+class NotInSubqueryEndToEndSuite extends QueryTest with SharedSparkSession {

Review comment:
       Hmm, I think we already have test suite e.g. SubquerySuite. Can we just put the test into existing test suite?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #35428: [SPARK-38132][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #35428:
URL: https://github.com/apache/spark/pull/35428#discussion_r801053144



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/NotInSubqueryEndToEndSuite.scala
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.SharedSparkSession
+
+class NotInSubqueryEndToEndSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  val t = "test_table"
+
+  test("SPARK-38132: Avoid Optimizing Not(InSubquery)") {

Review comment:
       nit: "Avoid Optimizing Not(InSubquery)" -> "Avoid optimizing Not IN subquery"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35428: [SPARK-38132][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35428:
URL: https://github.com/apache/spark/pull/35428#discussion_r801087830



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##########
@@ -1956,4 +1956,30 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
     assert(!nonDeterministicQueryPlan.deterministic)
   }
 
+  test("SPARK-38132: Avoid optimizing Not IN subquery") {

Review comment:
       Shall we give more meaningful name for this test case?
   For me, the test case looks like checking the correctness of the queries which is irrelevant to avoid `optimizing` something.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35428: [SPARK-38132][SQL] Remove `NotPropagation` rule

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35428:
URL: https://github.com/apache/spark/pull/35428#issuecomment-1032110447


   Thanks for pinging me. Just checking for my own understanding - so is this technically a revert of SPARK-36665?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #35428: [SPARK-38132][SQL] Remove NotPropagation

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #35428:
URL: https://github.com/apache/spark/pull/35428#discussion_r801065714



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/NotInSubqueryEndToEndSuite.scala
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.SharedSparkSession
+
+class NotInSubqueryEndToEndSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  val t = "test_table"
+
+  test("SPARK-38132: Avoid Optimizing Not(InSubquery)") {

Review comment:
       updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org