You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/05/03 15:21:06 UTC

[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21230

    [SPARK-24172][SQL] we should not apply operator pushdown to data source v2 many times

    ## What changes were proposed in this pull request?
    
    In `PushDownOperatorsToDataSource`, we use `transformUp` to match `PhysicalOperation` and apply pushdown. This is problematic if we have multiple `Filter` and `Project` above the data source v2 relation.
    
    e.g. for a query
    ```
    Project
      Filter
        DataSourceV2Relation
    ```
    
    The pattern match will be triggered twice and we will do operator pushdown twice. This is unnecessary, we can use `transformDown` to only apply pushdown once.
    
    ## How was this patch tested?
    
    existing test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark step2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21230
    
----
commit e224f8a798ed30319efab386720c997227e1b421
Author: Wenchen Fan <we...@...>
Date:   2018-05-03T15:14:01Z

    we should not apply operator pushdown to data source v2 many times

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Hi @rdblue , thanks for your new approach! Like you said, the major problem is about statistics. This is unfortunately a problem of Spark's CBO design: the statistics should belong to physical node but it currently belongs to logical node.
    
    For file-based data sources, since they are builtin sources, we can create rules to update statistics at logical phase, i.e. `PruneFileSourcePartitions`. But for external sources like iceberg, we would not be able to update statistics before planning, and shuffle join may be wrongly planned while broadcast join is applicable. In other words, users may need to create custom optimizer rules to make their data source work well.
    
    That said, I do like your approach if we can fix the statistics problem first. I'm not sure how hard and how soon it can be fixed, cc @wzhfy 
    
    Before that, I'd like to still keep the pushdown logic in optimizer and left the hard work to Spark instead of users. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90436/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    LGTM Thanks! Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90488/testReport)** for PR 21230 at commit [`953cd7a`](https://github.com/apache/spark/commit/953cd7a823f487291ac389eeb9d98219bf447ebf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90436/testReport)** for PR 21230 at commit [`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3121/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90140/testReport)** for PR 21230 at commit [`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21230#discussion_r185986217
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -23,49 +23,46 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
     import org.apache.spark.sql.catalyst.rules.Rule
     
     object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
    -  override def apply(
    -      plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    // PhysicalOperation guarantees that filters are deterministic; no need to check
    -    case PhysicalOperation(project, newFilters, relation : DataSourceV2Relation) =>
    -      // merge the filters
    -      val filters = relation.filters match {
    -        case Some(existing) =>
    -          existing ++ newFilters
    -        case _ =>
    -          newFilters
    -      }
    +  override def apply(plan: LogicalPlan): LogicalPlan = {
    +    var pushed = false
    +    plan transformDown {
    +      // PhysicalOperation guarantees that filters are deterministic; no need to check
    +      case PhysicalOperation(project, filters, relation: DataSourceV2Relation) if !pushed =>
    --- End diff --
    
    Is that possible one plan has multiple `PhysicalOperation`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    +1 (assuming tests pass)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90467/testReport)** for PR 21230 at commit [`f73440c`](https://github.com/apache/spark/commit/f73440c49f8d70d89ffb5f9146a5b0e13934bc85).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90467/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    @cloud-fan, I opened #21262 that is similar to this, but does pushdown when converting to a physical plan. You might like that as an alternative because it cleans up `DataSourceV2Relation` quite a bit and adds `output` to the case class arguments like other relations.
    
    The drawback to that approach that I had forgotten about is that it breaks `computeStats` because that runs on the optimized plan (but this affects all the other code paths as well).
    
    Up to you how to continue with this work, I just think we should consider the other approach since it solves a few problems. And `computeStats` is something we should update to work on physical plans anyway, right? Just let me know how you want to move forward. If you want to pull that commit into this PR, I'll close the other one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90436/testReport)** for PR 21230 at commit [`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90140/testReport)** for PR 21230 at commit [`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90506/testReport)** for PR 21230 at commit [`953cd7a`](https://github.com/apache/spark/commit/953cd7a823f487291ac389eeb9d98219bf447ebf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90506/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3090/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3139/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3108/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2877/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    So it was the use of `transformUp` that caused this rules to match multiple times, right? In that case, would it make more sense to do what @marmbrus suggested in the immutable plan PR and make this a strategy instead of an optimizer rule?
    
    That approach fits with what I suggested on #21118. We could have the scan node handle the filter and the projection so that it doesn't matter whether the source produces `UnsafeRow` or `InternalRow`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21230#discussion_r187388065
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -23,17 +23,10 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
     import org.apache.spark.sql.catalyst.rules.Rule
     
     object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
    -  override def apply(
    -      plan: LogicalPlan): LogicalPlan = plan transformUp {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan.mapChildren {
    --- End diff --
    
    Could you update the PR description (`transformDown` -> `mapChildren`), too?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90488/testReport)** for PR 21230 at commit [`953cd7a`](https://github.com/apache/spark/commit/953cd7a823f487291ac389eeb9d98219bf447ebf).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21230#discussion_r185991248
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -23,49 +23,46 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
     import org.apache.spark.sql.catalyst.rules.Rule
     
     object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
    -  override def apply(
    -      plan: LogicalPlan): LogicalPlan = plan transformUp {
    -    // PhysicalOperation guarantees that filters are deterministic; no need to check
    -    case PhysicalOperation(project, newFilters, relation : DataSourceV2Relation) =>
    -      // merge the filters
    -      val filters = relation.filters match {
    -        case Some(existing) =>
    -          existing ++ newFilters
    -        case _ =>
    -          newFilters
    -      }
    +  override def apply(plan: LogicalPlan): LogicalPlan = {
    +    var pushed = false
    +    plan transformDown {
    +      // PhysicalOperation guarantees that filters are deterministic; no need to check
    +      case PhysicalOperation(project, filters, relation: DataSourceV2Relation) if !pushed =>
    --- End diff --
    
    `PhysicalOperation` just accumulates project and filter above a specific node, if we transform down a tree, and only transform once, we will never hit `PhysicalOperation` move than once.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90488/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90140/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Sounds good to me. Lets plan on getting this one in to fix the current problem, and commit the other approach when stats are fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    cc @rdblue @gatorsmile @gengliangwang 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90467/testReport)** for PR 21230 at commit [`f73440c`](https://github.com/apache/spark/commit/f73440c49f8d70d89ffb5f9146a5b0e13934bc85).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21230


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    **[Test build #90506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90506/testReport)** for PR 21230 at commit [`953cd7a`](https://github.com/apache/spark/commit/953cd7a823f487291ac389eeb9d98219bf447ebf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21230
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org