You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/05/23 14:04:52 UTC
[GitHub] spark pull request #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL ...
GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/21411
[SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY
## What changes were proposed in this pull request?
In current parquet version,the conf ENABLE_JOB_SUMMARY is deprecated.
When writing to Parquet files, the warning message
```WARN org.apache.parquet.hadoop.ParquetOutputFormat: Setting parquet.enable.summary-metadata is deprecated, please use parquet.summary.metadata.level```
keeps showing up.
From https://github.com/apache/parquet-mr/blame/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L164 we can see that we should use JOB_SUMMARY_LEVEL.
## How was this patch tested?
Unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark summaryLevel
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21411.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21411
----
commit 6fa194968893b1347e7c08ab5b9eb12716114e46
Author: Gengliang Wang <ge...@...>
Date: 2018-05-23T13:35:26Z
use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21411
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21411
cc @michal-databricks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21411
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91042/testReport)** for PR 21411 at commit [`6fa1949`](https://github.com/apache/spark/commit/6fa194968893b1347e7c08ab5b9eb12716114e46).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91044/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91042/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91073/testReport)** for PR 21411 at commit [`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91044 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91044/testReport)** for PR 21411 at commit [`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21411
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21411
I'm fine with whatever changes you want to make here because we don't use Parquet summary files.
As always, I'll note that I think it is a bad idea to support the summary files in general. They have been deprecated in Parquet and are not a reliable source of information.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91067/testReport)** for PR 21411 at commit [`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91073/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3513/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3523/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91067/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21411
I remember this summary file is disabled by default anyway. I think it's fine to just get rid of warnings.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91067/testReport)** for PR 21411 at commit [`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/21411
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3528/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21411
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3511/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/21411
@rdblue @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91042/testReport)** for PR 21411 at commit [`6fa1949`](https://github.com/apache/spark/commit/6fa194968893b1347e7c08ab5b9eb12716114e46).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21411
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91073/testReport)** for PR 21411 at commit [`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21411#discussion_r190264804
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala ---
@@ -125,16 +126,16 @@ class ParquetFileFormat
conf.set(ParquetOutputFormat.COMPRESSION, parquetOptions.compressionCodecClassName)
// SPARK-15719: Disables writing Parquet summary files by default.
- if (conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) {
- conf.setBoolean(ParquetOutputFormat.ENABLE_JOB_SUMMARY, false)
+ if (conf.get(ParquetOutputFormat.JOB_SUMMARY_LEVEL) == null) {
--- End diff --
for backward compatibility, `ENABLE_JOB_SUMMARY` should still be respected.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21411
**[Test build #91044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91044/testReport)** for PR 21411 at commit [`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org