You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/05/03 05:26:35 UTC

[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21224

    [SPARK-24167][SQL] ParquetFilters should not access SQLConf at executor side

    ## What changes were proposed in this pull request?
    
    This PR is extracted from #21190 , to make it easier to backport.
    
    `ParquetFilters` is used in the file scan function, which is executed in executor side, so we can't can't call `conf.parquetFilterPushDownDate` there.
    
    ## How was this patch tested?
    
    it's tested in #21190

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark minor2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21224
    
----
commit c58baad051259d7d2d54f1eb5e84c4bdac0867a6
Author: Wenchen Fan <we...@...>
Date:   2018-05-03T05:20:06Z

    ParquetFilters should not access SQLConf at executor side

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    **[Test build #90110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90110/testReport)** for PR 21224 at commit [`d7dc8a8`](https://github.com/apache/spark/commit/d7dc8a85489122e5b91cf5bc7cc0190f4d474a2c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    @cloud-fan, the change seems fine but would there be any clever trick to test this? Seems we could very likely do the similar thing by mistake.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90096/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    **[Test build #90096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90096/testReport)** for PR 21224 at commit [`c58baad`](https://github.com/apache/spark/commit/c58baad051259d7d2d54f1eb5e84c4bdac0867a6).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21224#discussion_r185876764
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -342,6 +342,7 @@ class ParquetFileFormat
           sparkSession.sessionState.conf.parquetFilterPushDown
         // Whole stage codegen (PhysicalRDD) is able to deal with batches directly
         val returningBatch = supportBatch(sparkSession, resultSchema)
    +    val pushDownDate = sqlConf.parquetFilterPushDownDate
    --- End diff --
    
    Can we pass `pushed` instead of declaring new `pushDownDate`? 
    The following can be handled at line 345 here.
    
    ```scala
           // Try to push down filters when filter push-down is enabled.
           val pushed = if (enableParquetFilterPushDown) {
             filters
               // Collects all converted Parquet filter predicates. Notice that not all predicates can be
               // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap`
               // is used here.
              .flatMap(new ParquetFilters(pushDownDate).createFilter(requiredSchema, _))
              .reduceOption(FilterApi.and)
           } else {
             None
           }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2854/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2842/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90110/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    **[Test build #90110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90110/testReport)** for PR 21224 at commit [`d7dc8a8`](https://github.com/apache/spark/commit/d7dc8a85489122e5b91cf5bc7cc0190f4d474a2c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    cc @gatorsmile


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    **[Test build #90096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90096/testReport)** for PR 21224 at commit [`c58baad`](https://github.com/apache/spark/commit/c58baad051259d7d2d54f1eb5e84c4bdac0867a6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    I realized #21086 is only in master, so this bug doesn't exist in 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21224


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21224#discussion_r185975883
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -342,6 +342,7 @@ class ParquetFileFormat
           sparkSession.sessionState.conf.parquetFilterPushDown
         // Whole stage codegen (PhysicalRDD) is able to deal with batches directly
         val returningBatch = supportBatch(sparkSession, resultSchema)
    +    val pushDownDate = sqlConf.parquetFilterPushDownDate
    --- End diff --
    
    no we can't, see https://github.com/apache/spark/pull/21086


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21224: [SPARK-24167][SQL] ParquetFilters should not access SQLC...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21224
  
    and branch 2-3 too ..?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21224#discussion_r185988219
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
    @@ -342,6 +342,7 @@ class ParquetFileFormat
           sparkSession.sessionState.conf.parquetFilterPushDown
         // Whole stage codegen (PhysicalRDD) is able to deal with batches directly
         val returningBatch = supportBatch(sparkSession, resultSchema)
    +    val pushDownDate = sqlConf.parquetFilterPushDownDate
    --- End diff --
    
    Ah, I see. Thank you, @cloud-fan !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org