You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by henryr <gi...@git.apache.org> on 2018/05/11 19:05:45 UTC

[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

GitHub user henryr opened a pull request:

    https://github.com/apache/spark/pull/21302

    [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

    ## What changes were proposed in this pull request?
    
    Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217
    
    ## How was this patch tested?
    
    Ran testcase from SPARK-23852 (will backport in a separate PR after this goes in).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/henryr/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21302
    
----
commit 35e214995201d6b3a9a013d0f8d2106b084f4de9
Author: Henry Robinson <he...@...>
Date:   2018-05-11T18:50:26Z

    [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90523/testReport)** for PR 21302 at commit [`c681819`](https://github.com/apache/spark/commit/c681819ae4af46b685b4dcca0039b0be13ce1bb0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90536/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    LGTM pending tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    @henryr could you update the PR description (part about the test backport)? Thx


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90523/testReport)** for PR 21302 at commit [`c681819`](https://github.com/apache/spark/commit/c681819ae4af46b685b4dcca0039b0be13ce1bb0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90523/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90522/testReport)** for PR 21302 at commit [`8f4b3db`](https://github.com/apache/spark/commit/8f4b3dba57ac4cc03db227c3914cfdfe9ae0c90e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3153/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90522/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90527/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3163/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90527/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merging to 2.3. In the unlikely event of issues, we can address them later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3156/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90521/testReport)** for PR 21302 at commit [`35e2149`](https://github.com/apache/spark/commit/35e214995201d6b3a9a013d0f8d2106b084f4de9).
     * This patch **fails build dependency tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    @gatorsmile, that is correct. https://github.com/apache/parquet-mr/commits/apache-parquet-1.8.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90536/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by henryr <gi...@git.apache.org>.
Github user henryr closed the pull request at:

    https://github.com/apache/spark/pull/21302


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Apache Parquet 1.8.3 release only contains https://github.com/apache/parquet-mr/pull/465 and https://github.com/apache/parquet-mr/pull/468, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    +1 when tests are passing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21302#discussion_r187762385
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
    +    // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
    +    // The row-group statistics include null counts, but not min and max values, which
    +    // triggers PARQUET-1217.
    +    val df = readResourceParquetFile("test-data/parquet-1217.parquet")
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90536/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3154/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    cc @liancheng @michal-databricks @cloud-fan Please double check and confirm the risk of these two Parquet PRs is low. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21302#discussion_r188022670
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
    +    // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
    +    // The row-group statistics include null counts, but not min and max values, which
    +    // triggers PARQUET-1217.
    +    val df = readResourceParquetFile("test-data/parquet-1217.parquet")
    --- End diff --
    
    That should be done in master (and backported to 2.3 if desired).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90522/testReport)** for PR 21302 at commit [`8f4b3db`](https://github.com/apache/spark/commit/8f4b3dba57ac4cc03db227c3914cfdfe9ae0c90e).
     * This patch **fails build dependency tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Sounds good, done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Any remaining feedback here? Otherwise I'd like to get this in before soon-ish.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3152/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    @henryr, why not backport the test case in this commit? I don't think it makes sense to separate the two because that test verifies this commit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90527/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Also, please close the PR manually (github doesn't do that for branches).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90521/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21302#discussion_r188042296
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
    +    // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
    +    // The row-group statistics include null counts, but not min and max values, which
    +    // triggers PARQUET-1217.
    +    val df = readResourceParquetFile("test-data/parquet-1217.parquet")
    --- End diff --
    
    PR for master is https://github.com/apache/spark/pull/21323. My guess is there's no reason to block this backport and 2.3.1 by waiting for it to land, but happy to do whatever.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21302#discussion_r187745471
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
    +    // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
    +    // The row-group statistics include null counts, but not min and max values, which
    +    // triggers PARQUET-1217.
    +    val df = readResourceParquetFile("test-data/parquet-1217.parquet")
    --- End diff --
    
    Since this test case assumes `spark.sql.parquet.filterPushdown=true`, let's use the followings.
    ```scala
    withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21302
  
    **[Test build #90521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90521/testReport)** for PR 21302 at commit [`35e2149`](https://github.com/apache/spark/commit/35e214995201d6b3a9a013d0f8d2106b084f4de9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org