You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by henryr <gi...@git.apache.org> on 2018/05/11 19:05:45 UTC
[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
GitHub user henryr opened a pull request:
https://github.com/apache/spark/pull/21302
[SPARK-23852][SQL] Upgrade to Parquet 1.8.3
## What changes were proposed in this pull request?
Upgrade Parquet dependency to 1.8.3 to avoid PARQUET-1217
## How was this patch tested?
Ran testcase from SPARK-23852 (will backport in a separate PR after this goes in).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/henryr/spark branch-2.3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21302.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21302
----
commit 35e214995201d6b3a9a013d0f8d2106b084f4de9
Author: Henry Robinson <he...@...>
Date: 2018-05-11T18:50:26Z
[SPARK-23852][SQL] Upgrade to Parquet 1.8.3
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90523/testReport)** for PR 21302 at commit [`c681819`](https://github.com/apache/spark/commit/c681819ae4af46b685b4dcca0039b0be13ce1bb0).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21302
Done.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90536/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21302
LGTM pending tests.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21302
@henryr could you update the PR description (part about the test backport)? Thx
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90523/testReport)** for PR 21302 at commit [`c681819`](https://github.com/apache/spark/commit/c681819ae4af46b685b4dcca0039b0be13ce1bb0).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90523/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90522/testReport)** for PR 21302 at commit [`8f4b3db`](https://github.com/apache/spark/commit/8f4b3dba57ac4cc03db227c3914cfdfe9ae0c90e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3153/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90522/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90527/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3163/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90527/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21302
Merging to 2.3. In the unlikely event of issues, we can address them later.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3156/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90521/testReport)** for PR 21302 at commit [`35e2149`](https://github.com/apache/spark/commit/35e214995201d6b3a9a013d0f8d2106b084f4de9).
* This patch **fails build dependency tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21302
@gatorsmile, that is correct. https://github.com/apache/parquet-mr/commits/apache-parquet-1.8.3
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90536/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by henryr <gi...@git.apache.org>.
Github user henryr closed the pull request at:
https://github.com/apache/spark/pull/21302
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21302
Apache Parquet 1.8.3 release only contains https://github.com/apache/parquet-mr/pull/465 and https://github.com/apache/parquet-mr/pull/468, right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21302
+1 when tests are passing.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21302#discussion_r187762385
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
@@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
}
}
}
+
+ test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
+ // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
+ // The row-group statistics include null counts, but not min and max values, which
+ // triggers PARQUET-1217.
+ val df = readResourceParquetFile("test-data/parquet-1217.parquet")
--- End diff --
+1
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90536/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3154/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21302
cc @liancheng @michal-databricks @cloud-fan Please double check and confirm the risk of these two Parquet PRs is low.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21302
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21302#discussion_r188022670
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
@@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
}
}
}
+
+ test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
+ // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
+ // The row-group statistics include null counts, but not min and max values, which
+ // triggers PARQUET-1217.
+ val df = readResourceParquetFile("test-data/parquet-1217.parquet")
--- End diff --
That should be done in master (and backported to 2.3 if desired).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90522/testReport)** for PR 21302 at commit [`8f4b3db`](https://github.com/apache/spark/commit/8f4b3dba57ac4cc03db227c3914cfdfe9ae0c90e).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21302
Sounds good, done.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21302
Any remaining feedback here? Otherwise I'd like to get this in before soon-ish.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3152/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21302
@henryr, why not backport the test case in this commit? I don't think it makes sense to separate the two because that test verifies this commit.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90527/testReport)** for PR 21302 at commit [`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21302
Also, please close the PR manually (github doesn't do that for branches).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21302
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90521/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21302#discussion_r188042296
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
@@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
}
}
}
+
+ test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
+ // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
+ // The row-group statistics include null counts, but not min and max values, which
+ // triggers PARQUET-1217.
+ val df = readResourceParquetFile("test-data/parquet-1217.parquet")
--- End diff --
PR for master is https://github.com/apache/spark/pull/21323. My guess is there's no reason to block this backport and 2.3.1 by waiting for it to land, but happy to do whatever.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/21302#discussion_r187745471
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
@@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
}
}
}
+
+ test("SPARK-23852: Broken Parquet push-down for partially-written stats") {
+ // parquet-1217.parquet contains a single column with values -1, 0, 1, 2 and null.
+ // The row-group statistics include null counts, but not min and max values, which
+ // triggers PARQUET-1217.
+ val df = readResourceParquetFile("test-data/parquet-1217.parquet")
--- End diff --
Since this test case assumes `spark.sql.parquet.filterPushdown=true`, let's use the followings.
```scala
withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21302
**[Test build #90521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90521/testReport)** for PR 21302 at commit [`35e2149`](https://github.com/apache/spark/commit/35e214995201d6b3a9a013d0f8d2106b084f4de9).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org