You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/16 11:43:55 UTC

[GitHub] [spark] wangyum opened a new pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

wangyum opened a new pull request #31573:
URL: https://github.com/apache/spark/pull/31573


   ### What changes were proposed in this pull request?
   
   This pr pushdown scalar-subquery filter to FileSourceScan. For example:
   ```scala
   sql("CREATE TABLE t1 using parquet AS SELECT id AS a, id AS b FROM range(500000000L)")
   sql("CREATE TABLE t2 using parquet AS SELECT id AS d FROM range(20)")
   sql("SELECT * FROM t1 where b = (select max(d) from t2)").show
   ```
   Before this pr | After this pr
   -- | --
   ![image](https://user-images.githubusercontent.com/5399861/108058404-f4396080-708e-11eb-8e9c-0f2b5e98967c.png) | ![image](https://user-images.githubusercontent.com/5399861/108058416-fb606e80-708e-11eb-8886-7f1ee2e0bdbb.png)
   
   
   
   ### Why are the changes needed?
   
   Improve query performance.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Unit test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815


   ~~@wangyum, how does this PR relate to https://github.com/apache/spark/pull/23802? Is this a regression in 3.1?~~ Nvm.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781104483


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39796/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782876215


   **[Test build #135321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135321/testReport)** for PR 31573 at commit [`0699dd0`](https://github.com/apache/spark/commit/0699dd08796514c0853dc3633f9ca35b0f8f489a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781060196


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39796/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782565340


   Thank you all. I need to fix the failed test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783386181


   **[Test build #135346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135346/testReport)** for PR 31573 at commit [`b207612`](https://github.com/apache/spark/commit/b207612abbad8e5b64dd2b7d65f990e643ace6f0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781104483


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39796/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782897104


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39901/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727


   @wangyum - Just curious to clear my understanding about this change.
   I am able to see the scalar sub-query in the example without joins.
   `select max(d) from t2)`
   
   Are we pushing down the scalar sub-query which is having  single or multiple joins and than returning the single row 
   For eg 
   `SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1)`
   
   So now if the `select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result 
   
   if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
   
   Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
   
   Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
   
   `If Scalar Subquery completes first than scan of t1 starts in before this change and after this PR also , Than push down of  scalar subquery will always be faster`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781078101


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135217/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782890837


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135321/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815


   @wangyum, how does this PR relate to https://github.com/apache/spark/pull/23802? Is this a regression in 3.1?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783446249


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39926/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782887804


   **[Test build #135321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135321/testReport)** for PR 31573 at commit [`0699dd0`](https://github.com/apache/spark/commit/0699dd08796514c0853dc3633f9ca35b0f8f489a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779794834


   **[Test build #135176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135176/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783417848


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39926/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781071984


   **[Test build #135217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135217/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779849861


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39757/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779794834


   **[Test build #135176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135176/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r576989630



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##########
@@ -335,6 +336,19 @@ case class FileSourceScanExec(
     dataFilters.flatMap(DataSourceStrategy.translateFilter(_, supportNestedPredicatePushdown))
   }
 
+  @transient
+  private lazy val runtimePushedDownFilters = {
+    dataFilters.flatMap {
+      case e: Expression if ExecSubqueryExpression.hasScalarSubquery(e) =>
+        val updatedValue = e.transform {
+          case s: ScalarSubquery => s.value
+        }
+        Some(updatedValue)
+      case _ =>
+        Nil
+    }.flatMap(translateFilter(_, DataSourceUtils.supportNestedPredicatePushdown(relation)))

Review comment:
       nit: IMO we can combine the 2 `flatMap`s and the code is still easy to read

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##########
@@ -166,14 +167,11 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging {
       val partitionKeyFilters = DataSourceStrategy.getPushedDownFilters(partitionColumns,
         normalizedFilters)
 
-      // subquery expressions are filtered out because they can't be used to prune buckets or pushed
-      // down as data filters, yet they would be executed
-      val normalizedFiltersWithoutSubqueries =
-        normalizedFilters.filterNot(SubqueryExpression.hasSubquery)
-
       val bucketSpec: Option[BucketSpec] = fsRelation.bucketSpec
       val bucketSet = if (shouldPruneBuckets(bucketSpec)) {
-        genBucketSet(normalizedFiltersWithoutSubqueries, bucketSpec.get)
+        // subquery expressions are filtered out because they can't be used to prune buckets
+        // or pushed down as data filters, yet they would be executed

Review comment:
       nit: `or pushed down as data filters` is not valid here

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##########
@@ -184,7 +182,9 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging {
       // Partition keys are not available in the statistics of the files.
       // `dataColumns` might have partition columns, we need to filter them out.
       val dataColumnsWithoutPartitionCols = dataColumns.filterNot(partitionColumns.contains)
-      val dataFilters = normalizedFiltersWithoutSubqueries.flatMap { f =>
+      // Non-scalar subquery expressions are filtered out because they can't be used to prune
+      // buckets or pushed down as data filters, yet they would be executed

Review comment:
       bi: `to prune buckets` is not valid here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783446249


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39926/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779927497


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135176/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727


   @wangyum - Just curious to clear my understanding about this change.
   I am able to see the scalar sub-query in the example without joins.
   `select max(d) from t2)`
   
   Are we pushing down the scalar sub-query which is having  single or multiple joins and than returning the single row 
   For eg 
   `SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1)`
   
   So now if the `select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result 
   
   if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
   
   Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
   
   Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
   
   `If Scalar Subquery completes first, than only scan of t1 starts before this change and after this PR also , Than push down of  scalar subquery will always be faster`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781100254


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39796/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783386181


   **[Test build #135346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135346/testReport)** for PR 31573 at commit [`b207612`](https://github.com/apache/spark/commit/b207612abbad8e5b64dd2b7d65f990e643ace6f0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r576989349



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -85,6 +85,26 @@ object SubqueryExpression {
     }.isDefined
   }
 
+  /**
+   * Returns true when an expression contains a scalar subquery
+   */
+  def hasScalarSubquery(e: Expression): Boolean = {
+    e.find {
+      case _: ScalarSubquery => true
+      case _ => false
+    }.isDefined
+  }
+
+  /**
+   * Returns true when an expression contains a non-scalar subquery
+   */
+  def hasNonScalarSubquery(e: Expression): Boolean = {
+    e.find {
+      case s: SubqueryExpression if !hasScalarSubquery(s) => true

Review comment:
       how about `case s: SubqueryExpression => !s.isInstanceOf[ScalarSubquery]` and we probably don't need `def hasScalarSubquery()` at all?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r576989752



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##########
@@ -184,7 +182,9 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging {
       // Partition keys are not available in the statistics of the files.
       // `dataColumns` might have partition columns, we need to filter them out.
       val dataColumnsWithoutPartitionCols = dataColumns.filterNot(partitionColumns.contains)
-      val dataFilters = normalizedFiltersWithoutSubqueries.flatMap { f =>
+      // Non-scalar subquery expressions are filtered out because they can't be used to prune
+      // buckets or pushed down as data filters, yet they would be executed

Review comment:
       nit: `to prune buckets` is not valid here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783489694


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135346/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781033387


   **[Test build #135217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135217/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SaurabhChawla100 commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727


   @wangyum - Just curious to clear my understanding about this change.
   I am able to see the scalar sub-query in the example without joins.
   `select max(d) from t2)`
   
   Are we pushing down the scalar sub-query which is having  single or multiple joins and than returning the single row 
   For eg 
   `SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1)`
   
   So now if the `select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result 
   
   if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
   
   Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
   
   Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-797651022


   Thank you @SaurabhChawla100.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782900193


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39901/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783442708


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39926/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783489694


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135346/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779927497


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135176/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782900193


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39901/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781078101


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135217/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727


   @wangyum - Just curious to clear my understanding about this change.
   I am able to see the scalar sub-query in the example without joins.
   `select max(d) from t2)`
   
   Are we pushing down the scalar sub-query which is having  single or multiple joins and than returning the single row 
   For eg 
   `SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1)`
   
   So now if the `select t2.id from t2 , t3 , t4  where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result 
   
   if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
   
   Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
   
   Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
   
   `If Scalar Subquery completes first than scan of t1 starts in before this change or after this change also , Than push down of  scalar subquery will always be faster`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779849861


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39757/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781033387


   **[Test build #135217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135217/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781032097


   Retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779896519


   **[Test build #135176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135176/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782876215


   **[Test build #135321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135321/testReport)** for PR 31573 at commit [`0699dd0`](https://github.com/apache/spark/commit/0699dd08796514c0853dc3633f9ca35b0f8f489a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
peter-toth commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815


   @wangyum, how does this PR relates to https://github.com/apache/spark/pull/23802?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r577366266



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -85,6 +85,26 @@ object SubqueryExpression {
     }.isDefined
   }
 
+  /**
+   * Returns true when an expression contains a scalar subquery
+   */
+  def hasScalarSubquery(e: Expression): Boolean = {
+    e.find {
+      case _: ScalarSubquery => true
+      case _ => false
+    }.isDefined
+  }
+
+  /**
+   * Returns true when an expression contains a non-scalar subquery
+   */
+  def hasNonScalarSubquery(e: Expression): Boolean = {

Review comment:
       Isn't it equal to `hasSubquery(e) && !hasScalarSubquery(e)`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815


   @wangyum, how does this PR relate to https://github.com/apache/spark/pull/23802?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782890837


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135321/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783478080


   **[Test build #135346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135346/testReport)** for PR 31573 at commit [`b207612`](https://github.com/apache/spark/commit/b207612abbad8e5b64dd2b7d65f990e643ace6f0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782883005


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39901/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum closed pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #31573:
URL: https://github.com/apache/spark/pull/31573


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org