You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/16 11:43:55 UTC
[GitHub] [spark] wangyum opened a new pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
wangyum opened a new pull request #31573:
URL: https://github.com/apache/spark/pull/31573
### What changes were proposed in this pull request?
This pr pushdown scalar-subquery filter to FileSourceScan. For example:
```scala
sql("CREATE TABLE t1 using parquet AS SELECT id AS a, id AS b FROM range(500000000L)")
sql("CREATE TABLE t2 using parquet AS SELECT id AS d FROM range(20)")
sql("SELECT * FROM t1 where b = (select max(d) from t2)").show
```
Before this pr | After this pr
-- | --
![image](https://user-images.githubusercontent.com/5399861/108058404-f4396080-708e-11eb-8e9c-0f2b5e98967c.png) | ![image](https://user-images.githubusercontent.com/5399861/108058416-fb606e80-708e-11eb-8886-7f1ee2e0bdbb.png)
### Why are the changes needed?
Improve query performance.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815
~~@wangyum, how does this PR relate to https://github.com/apache/spark/pull/23802? Is this a regression in 3.1?~~ Nvm.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781104483
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39796/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782876215
**[Test build #135321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135321/testReport)** for PR 31573 at commit [`0699dd0`](https://github.com/apache/spark/commit/0699dd08796514c0853dc3633f9ca35b0f8f489a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781060196
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39796/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782565340
Thank you all. I need to fix the failed test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783386181
**[Test build #135346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135346/testReport)** for PR 31573 at commit [`b207612`](https://github.com/apache/spark/commit/b207612abbad8e5b64dd2b7d65f990e643ace6f0).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781104483
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39796/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782897104
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39901/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727
@wangyum - Just curious to clear my understanding about this change.
I am able to see the scalar sub-query in the example without joins.
`select max(d) from t2)`
Are we pushing down the scalar sub-query which is having single or multiple joins and than returning the single row
For eg
`SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1)`
So now if the `select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result
if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
`If Scalar Subquery completes first than scan of t1 starts in before this change and after this PR also , Than push down of scalar subquery will always be faster`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781078101
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135217/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782890837
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135321/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815
@wangyum, how does this PR relate to https://github.com/apache/spark/pull/23802? Is this a regression in 3.1?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783446249
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39926/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782887804
**[Test build #135321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135321/testReport)** for PR 31573 at commit [`0699dd0`](https://github.com/apache/spark/commit/0699dd08796514c0853dc3633f9ca35b0f8f489a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779794834
**[Test build #135176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135176/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783417848
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39926/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781071984
**[Test build #135217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135217/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779849861
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39757/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779794834
**[Test build #135176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135176/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r576989630
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##########
@@ -335,6 +336,19 @@ case class FileSourceScanExec(
dataFilters.flatMap(DataSourceStrategy.translateFilter(_, supportNestedPredicatePushdown))
}
+ @transient
+ private lazy val runtimePushedDownFilters = {
+ dataFilters.flatMap {
+ case e: Expression if ExecSubqueryExpression.hasScalarSubquery(e) =>
+ val updatedValue = e.transform {
+ case s: ScalarSubquery => s.value
+ }
+ Some(updatedValue)
+ case _ =>
+ Nil
+ }.flatMap(translateFilter(_, DataSourceUtils.supportNestedPredicatePushdown(relation)))
Review comment:
nit: IMO we can combine the 2 `flatMap`s and the code is still easy to read
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##########
@@ -166,14 +167,11 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging {
val partitionKeyFilters = DataSourceStrategy.getPushedDownFilters(partitionColumns,
normalizedFilters)
- // subquery expressions are filtered out because they can't be used to prune buckets or pushed
- // down as data filters, yet they would be executed
- val normalizedFiltersWithoutSubqueries =
- normalizedFilters.filterNot(SubqueryExpression.hasSubquery)
-
val bucketSpec: Option[BucketSpec] = fsRelation.bucketSpec
val bucketSet = if (shouldPruneBuckets(bucketSpec)) {
- genBucketSet(normalizedFiltersWithoutSubqueries, bucketSpec.get)
+ // subquery expressions are filtered out because they can't be used to prune buckets
+ // or pushed down as data filters, yet they would be executed
Review comment:
nit: `or pushed down as data filters` is not valid here
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##########
@@ -184,7 +182,9 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging {
// Partition keys are not available in the statistics of the files.
// `dataColumns` might have partition columns, we need to filter them out.
val dataColumnsWithoutPartitionCols = dataColumns.filterNot(partitionColumns.contains)
- val dataFilters = normalizedFiltersWithoutSubqueries.flatMap { f =>
+ // Non-scalar subquery expressions are filtered out because they can't be used to prune
+ // buckets or pushed down as data filters, yet they would be executed
Review comment:
bi: `to prune buckets` is not valid here
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783446249
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39926/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779927497
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135176/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727
@wangyum - Just curious to clear my understanding about this change.
I am able to see the scalar sub-query in the example without joins.
`select max(d) from t2)`
Are we pushing down the scalar sub-query which is having single or multiple joins and than returning the single row
For eg
`SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1)`
So now if the `select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result
if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
`If Scalar Subquery completes first, than only scan of t1 starts before this change and after this PR also , Than push down of scalar subquery will always be faster`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781100254
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39796/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783386181
**[Test build #135346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135346/testReport)** for PR 31573 at commit [`b207612`](https://github.com/apache/spark/commit/b207612abbad8e5b64dd2b7d65f990e643ace6f0).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r576989349
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -85,6 +85,26 @@ object SubqueryExpression {
}.isDefined
}
+ /**
+ * Returns true when an expression contains a scalar subquery
+ */
+ def hasScalarSubquery(e: Expression): Boolean = {
+ e.find {
+ case _: ScalarSubquery => true
+ case _ => false
+ }.isDefined
+ }
+
+ /**
+ * Returns true when an expression contains a non-scalar subquery
+ */
+ def hasNonScalarSubquery(e: Expression): Boolean = {
+ e.find {
+ case s: SubqueryExpression if !hasScalarSubquery(s) => true
Review comment:
how about `case s: SubqueryExpression => !s.isInstanceOf[ScalarSubquery]` and we probably don't need `def hasScalarSubquery()` at all?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r576989752
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##########
@@ -184,7 +182,9 @@ object FileSourceStrategy extends Strategy with PredicateHelper with Logging {
// Partition keys are not available in the statistics of the files.
// `dataColumns` might have partition columns, we need to filter them out.
val dataColumnsWithoutPartitionCols = dataColumns.filterNot(partitionColumns.contains)
- val dataFilters = normalizedFiltersWithoutSubqueries.flatMap { f =>
+ // Non-scalar subquery expressions are filtered out because they can't be used to prune
+ // buckets or pushed down as data filters, yet they would be executed
Review comment:
nit: `to prune buckets` is not valid here
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783489694
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135346/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781033387
**[Test build #135217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135217/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727
@wangyum - Just curious to clear my understanding about this change.
I am able to see the scalar sub-query in the example without joins.
`select max(d) from t2)`
Are we pushing down the scalar sub-query which is having single or multiple joins and than returning the single row
For eg
`SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1)`
So now if the `select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result
if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-797651022
Thank you @SaurabhChawla100.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782900193
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39901/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783442708
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39926/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783489694
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135346/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779927497
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135176/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782900193
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39901/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781078101
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135217/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SaurabhChawla100 edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SaurabhChawla100 edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-796804727
@wangyum - Just curious to clear my understanding about this change.
I am able to see the scalar sub-query in the example without joins.
`select max(d) from t2)`
Are we pushing down the scalar sub-query which is having single or multiple joins and than returning the single row
For eg
`SELECT * FROM t1 WHERE b = (select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1)`
So now if the `select t2.id from t2 , t3 , t4 where t2.id = t3.id1 and t2.date = t4.date1` this takes time to return the result
if we pushing the complex query , then we are holding the scan of the table t1 until the subquery gets completed and pushdown is done on t1 and after that it starts processing.
Will it not impact the performance compare to what is the existing functionality where both table scans starts at same time.
Second point what if the table t1 is having less number of rows do we still want to push down the scalar subquery which is the complex query itself.
`If Scalar Subquery completes first than scan of t1 starts in before this change or after this change also , Than push down of scalar subquery will always be faster`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779849861
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39757/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781033387
**[Test build #135217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135217/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-781032097
Retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779896519
**[Test build #135176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135176/testReport)** for PR 31573 at commit [`2bf9772`](https://github.com/apache/spark/commit/2bf9772043c28c3838ad791ba6eb3304e038c345).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782876215
**[Test build #135321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135321/testReport)** for PR 31573 at commit [`0699dd0`](https://github.com/apache/spark/commit/0699dd08796514c0853dc3633f9ca35b0f8f489a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
peter-toth commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815
@wangyum, how does this PR relates to https://github.com/apache/spark/pull/23802?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31573:
URL: https://github.com/apache/spark/pull/31573#discussion_r577366266
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -85,6 +85,26 @@ object SubqueryExpression {
}.isDefined
}
+ /**
+ * Returns true when an expression contains a scalar subquery
+ */
+ def hasScalarSubquery(e: Expression): Boolean = {
+ e.find {
+ case _: ScalarSubquery => true
+ case _ => false
+ }.isDefined
+ }
+
+ /**
+ * Returns true when an expression contains a non-scalar subquery
+ */
+ def hasNonScalarSubquery(e: Expression): Boolean = {
Review comment:
Isn't it equal to `hasSubquery(e) && !hasScalarSubquery(e)`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] peter-toth edited a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-779861815
@wangyum, how does this PR relate to https://github.com/apache/spark/pull/23802?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782890837
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135321/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-783478080
**[Test build #135346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135346/testReport)** for PR 31573 at commit [`b207612`](https://github.com/apache/spark/commit/b207612abbad8e5b64dd2b7d65f990e643ace6f0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31573:
URL: https://github.com/apache/spark/pull/31573#issuecomment-782883005
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39901/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum closed pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan
Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #31573:
URL: https://github.com/apache/spark/pull/31573
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org