You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maryannxue <gi...@git.apache.org> on 2018/05/18 06:43:59 UTC

[GitHub] spark pull request #21360: [SPARK-24288] Enable preventing predicate pushdow...

GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/21360

    [SPARK-24288] Enable preventing predicate pushdown

    ## What changes were proposed in this pull request?
    
    1. Add DataSet interface "withOptimizerBarrier()"
    2. Modify AnalysisBarrier to accommodate two scenarios: 1) analysis-only barriers; and 2) optimizer barriers.
    3. Add handling of Barrier in Optimizer (logical plan optimization).
    4. Add handling of Barrier in SparkStrategies (logical-to-physical plan translation).
    
    ## How was this patch tested?
    
    1. Add DataFrameOptimizerBarrierSuite to ensure:
        a) Barriers isolate optimization rule applications.
        b) Plans with optimization Barriers get resolved correctly, same as analysis-only Barriers.
        c) Barriers preserves ordering.
        d) Barriers preserves constraints.
    2. Add one test in JDBCSuite to verify scenario raised by the user (avoid filter push-down to data source).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-24288

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21360
    
----
commit d57d89ad98d554c020a59c5c44a5f6725d6e916e
Author: maryannxue <ma...@...>
Date:   2018-05-18T06:34:38Z

    [SPARK-24288] Enable preventing predicate pushdown

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3324/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90778/testReport)** for PR 21360 at commit [`d57d89a`](https://github.com/apache/spark/commit/d57d89ad98d554c020a59c5c44a5f6725d6e916e).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait Barrier extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90779/testReport)** for PR 21360 at commit [`60d72e3`](https://github.com/apache/spark/commit/60d72e3d6aadc0342f6f8f4795255d8a5e99e93a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90802/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    @TomaszGaweda @maryannxue Let us reduce the complexity and introduce a new JDBC option for controlling the predicate pushdown. 
    ```Scala
      val JDBC_FILTER_PUSHDOWN_ENABLED = buildConf("spark.sql.jdbc.filterPushdown")
        .doc("Enables JDBC filter push-down optimization when set to true.")
        .booleanConf
        .createWithDefault(true)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90799/testReport)** for PR 21360 at commit [`edadcf2`](https://github.com/apache/spark/commit/edadcf2ad6269af0d29220daf1ce708e7cfad1e1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    sorry, just updated the comment. We can add a JDBC connector option. Users can have two JDBC sources for a single JDBC table. One is with predicate pushdown; another without predicate pushdown. Does it resolve your issue?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Hi @maryannxue, thanks for the PR! Could you please rebase it? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90799/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90778/testReport)** for PR 21360 at commit [`d57d89a`](https://github.com/apache/spark/commit/d57d89ad98d554c020a59c5c44a5f6725d6e916e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3342/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    @gatorsmile This will reduce usability a lot. With current approach you can push down filters that may speed up reading. Global option will affect every other Dataset. To be honest new jdbc option won't fulfill users' requirements, at least those users who asked me for workarounds ;)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90799/testReport)** for PR 21360 at commit [`edadcf2`](https://github.com/apache/spark/commit/edadcf2ad6269af0d29220daf1ce708e7cfad1e1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21360: [SPARK-24288] Enable preventing predicate pushdow...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue closed the pull request at:

    https://github.com/apache/spark/pull/21360


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90802/testReport)** for PR 21360 at commit [`edadcf2`](https://github.com/apache/spark/commit/edadcf2ad6269af0d29220daf1ce708e7cfad1e1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90779/testReport)** for PR 21360 at commit [`60d72e3`](https://github.com/apache/spark/commit/60d72e3d6aadc0342f6f8f4795255d8a5e99e93a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3325/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90779/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Hi @maryannxue, can you please rebase this PR? Then maybe review will be possible by others. Would be great to include this in Spark 2.4 :) Thanks :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    I've tesed it with my application that had problem with predicate pushdowns to database. Looks good, performance is degradated a bit, but it was previously ran on Spark 2.3, not 2.4. However, memory consumption is much better as I don't have to cache input Datasets.
    
    LGTM From functional side. @gatorsmile @cloud-fan Could you please review it? Thanks! 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90778/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    **[Test build #90802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90802/testReport)** for PR 21360 at commit [`edadcf2`](https://github.com/apache/spark/commit/edadcf2ad6269af0d29220daf1ce708e7cfad1e1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3340/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    @gatorsmile That makes sense :) Simple predicates can be placed in dbtable option. Current approach is still more powerful, but if you think that the risk is too big, we can switch to reader's option


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    @viirya I've written it in the ticket. In my case, pushing down ORs with non-equality predicates caused DB2 to slow down; workaround was to cache data before filtering, it was approx. 10 times faster. This PR is to enable possibility to decide that you don't want to push down predicate without caching


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21360
  
    Can you tell more about the usage about `withOptimizerBarrier`? I'm curious how it can be used in production.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org