You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/02 07:43:04 UTC

[GitHub] [spark] kazuyukitanimura opened a new pull request #33898: [SPARK-36644][CORE][SQL] Push down boolean column filter

kazuyukitanimura opened a new pull request #33898:
URL: https://github.com/apache/spark/pull/33898


   ### What changes were proposed in this pull request?
   This PR proposes to improve `DataSourceStrategy` to be able to push down boolean column filters. Currently boolean column filters do not get pushed down and may cause unnecessary IO.
   
   
   ### Why are the changes needed?
   The following query does not push down the filter in the current implementation
   ```
   SELECT * FROM t WHERE boolean_field
   ```
   although the following query pushes down the filter as expected.
   ```
   SELECT * FROM t WHERE boolean_field = true
   ```
   This is because the Physical Planner (`DataSourceStrategy`) currently only pushes down limited expression patterns like`EqualTo`.
   It is fair for Spark SQL users to expect `boolean_field` performs the same as `boolean_field = true`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added unit tests
   ```
   build/sbt "core/testOnly *DataSourceStrategySuite   -- -z SPARK-36644"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912327884


   Merged into master. Thanks @kazuyukitanimura 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912596788


   **[Test build #142968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142968/testReport)** for PR 33898 at commit [`211f82c`](https://github.com/apache/spark/commit/211f82cc7a4975be14146ab816cd68207c6d6515).
    * This patch **fails from timeout after a configured wait of `500m`**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912329298


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912292356


   **[Test build #142968 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142968/testReport)** for PR 33898 at commit [`211f82c`](https://github.com/apache/spark/commit/211f82cc7a4975be14146ab816cd68207c6d6515).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912292356


   **[Test build #142968 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142968/testReport)** for PR 33898 at commit [`211f82c`](https://github.com/apache/spark/commit/211f82cc7a4975be14146ab816cd68207c6d6515).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912294684


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #33898: [SPARK-36644][CORE][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-911328294


   cc @dbtsai @sunchao @viirya @dongjoon-hyun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #33898:
URL: https://github.com/apache/spark/pull/33898#discussion_r700837948



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
##########
@@ -311,6 +312,21 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession {
     assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None)
   }
 
+  test("SPARK-36644: Push down boolean column filter") {
+    testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true)))
+
+    val t = "test_table"
+    withTable(t) {
+      import testImplicits._
+      Seq(Some(true), Some(false), None).toDF().write.saveAsTable(t)
+      val df = spark.table(t)
+      df.where("value").queryExecution.executedPlan.collectFirst {

Review comment:
       We should also verify the result after pushdown.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912291297


   **[Test build #142964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142964/testReport)** for PR 33898 at commit [`bb14898`](https://github.com/apache/spark/commit/bb14898e5b8311c82ccf001fef228783d5358a8b).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DataSourceStrategySuite extends PlanTest with SharedSparkSession `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912287079


   **[Test build #142964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142964/testReport)** for PR 33898 at commit [`bb14898`](https://github.com/apache/spark/commit/bb14898e5b8311c82ccf001fef228783d5358a8b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33898:
URL: https://github.com/apache/spark/pull/33898#discussion_r701366317



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
##########
@@ -311,6 +312,21 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession {
     assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None)
   }
 
+  test("SPARK-36644: Push down boolean column filter") {
+    testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true)))
+
+    val t = "test_table"
+    withTable(t) {
+      import testImplicits._
+      Seq(Some(true), Some(false), None).toDF().write.saveAsTable(t)
+      val df = spark.table(t)
+      df.where("value").queryExecution.executedPlan.collectFirst {

Review comment:
       updated and passed all tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-911838721


   Thank you for pinging me, @kazuyukitanimura .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912287079


   **[Test build #142964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142964/testReport)** for PR 33898 at commit [`bb14898`](https://github.com/apache/spark/commit/bb14898e5b8311c82ccf001fef228783d5358a8b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912291328






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33898:
URL: https://github.com/apache/spark/pull/33898#discussion_r701540575



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
##########
@@ -311,6 +311,22 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession {
     assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None)
   }
 
+  test("SPARK-36644: Push down boolean column filter") {
+    testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true)))
+
+    val t = "test_table"
+    withTable(t) {
+      import testImplicits._
+      Seq(Some(true), Some(false), None).toDF().write.saveAsTable(t)
+      val df = spark.table(t).where("value")
+      df.queryExecution.executedPlan.collectFirst {
+        case f: FileSourceScanExec =>
+          assert(f.metadata("PushedFilters") == "[IsNotNull(value), EqualTo(value,true)]")
+      }
+      checkAnswer(df, Row(true))
+    }

Review comment:
       I think we can just remove




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912204154


   looks fine to me 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on a change in pull request #33898:
URL: https://github.com/apache/spark/pull/33898#discussion_r701611408



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
##########
@@ -311,6 +311,22 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession {
     assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None)
   }
 
+  test("SPARK-36644: Push down boolean column filter") {
+    testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true)))
+
+    val t = "test_table"
+    withTable(t) {
+      import testImplicits._
+      Seq(Some(true), Some(false), None).toDF().write.saveAsTable(t)
+      val df = spark.table(t).where("value")
+      df.queryExecution.executedPlan.collectFirst {
+        case f: FileSourceScanExec =>
+          assert(f.metadata("PushedFilters") == "[IsNotNull(value), EqualTo(value,true)]")
+      }
+      checkAnswer(df, Row(true))
+    }

Review comment:
       Thank you! Removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912612897


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142968/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai closed pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
dbtsai closed pull request #33898:
URL: https://github.com/apache/spark/pull/33898


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912329265


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912324893


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912291328


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142964/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33898:
URL: https://github.com/apache/spark/pull/33898#discussion_r701540476



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
##########
@@ -311,6 +311,22 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession {
     assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None)
   }
 
+  test("SPARK-36644: Push down boolean column filter") {
+    testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true)))
+
+    val t = "test_table"
+    withTable(t) {
+      import testImplicits._
+      Seq(Some(true), Some(false), None).toDF().write.saveAsTable(t)
+      val df = spark.table(t).where("value")
+      df.queryExecution.executedPlan.collectFirst {
+        case f: FileSourceScanExec =>
+          assert(f.metadata("PushedFilters") == "[IsNotNull(value), EqualTo(value,true)]")
+      }
+      checkAnswer(df, Row(true))
+    }

Review comment:
       hm, do we need e2e test here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33898:
URL: https://github.com/apache/spark/pull/33898#issuecomment-912294715


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33898: [SPARK-36644][SQL] Push down boolean column filter

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33898:
URL: https://github.com/apache/spark/pull/33898#discussion_r701557724



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
##########
@@ -311,6 +311,22 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession {
     assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None)
   }
 
+  test("SPARK-36644: Push down boolean column filter") {
+    testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true)))
+
+    val t = "test_table"
+    withTable(t) {
+      import testImplicits._
+      Seq(Some(true), Some(false), None).toDF().write.saveAsTable(t)
+      val df = spark.table(t).where("value")
+      df.queryExecution.executedPlan.collectFirst {
+        case f: FileSourceScanExec =>
+          assert(f.metadata("PushedFilters") == "[IsNotNull(value), EqualTo(value,true)]")
+      }
+      checkAnswer(df, Row(true))
+    }

Review comment:
       We should follow the existing code in this suite to add a UT, not an end-to-end test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org