You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by 10110346 <gi...@git.apache.org> on 2018/09/07 07:36:43 UTC
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
GitHub user 10110346 opened a pull request:
https://github.com/apache/spark/pull/22358
[SPARK-25366][SQL]Zstd and brotil CompressionCodec are not supported for parquet files
## What changes were proposed in this pull request?
Hadoop2.6 and hadoop2.7 do not contain zstd and brotil compressioncodec ,hadoop 3.1 also contains only zstd compressioncodec .
So I think we should remove zstd and brotil for the time being.
**set `spark.sql.parquet.compression.codec=brotli`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.BrotliCodec was not found
at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)
**set `spark.sql.parquet.compression.codec=zstd`:**
Caused by: org.apache.parquet.hadoop.BadConfigurationException: Class org.apache.hadoop.io.compress.ZStandardCodec was not found
at org.apache.parquet.hadoop.CodecFactory.getCodec(CodecFactory.java:235)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:142)
at org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)
at org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)
at org.apache.parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:153)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:411)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:161)
## How was this patch tested?
Exist unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/10110346/spark notsupportzstdandbrotil
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22358.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22358
----
commit 1db036ad725bc7a3c60dbb9aede0f91cf0d798d0
Author: liuxian <li...@...>
Date: 2018-09-07T07:12:36Z
fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216881788
--- Diff: docs/sql-programming-guide.md ---
@@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+ Note that `zstd` needs to install `ZStandardCodec` before Hadoop 2.9.0, `brotli` needs to install
+ `brotliCodec`.
--- End diff --
If the link looks expected to be rather permanent, it's fine.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on the issue:
https://github.com/apache/spark/pull/22358
It is using reflection to acquire hadoop classes for compression which are not in the supplied dependencies(hadoop-common-2.6.5.jar, hadoop-common-2.7.0.jar, hadoop-common-3.1.0.jar).
`BROTLI("org.apache.hadoop.io.compress.BrotliCodec", CompressionCodec.BROTLI, ".br"),
ZSTD("org.apache.hadoop.io.compress.ZStandardCodec", CompressionCodec.ZSTD, ".zstd");`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on the issue:
https://github.com/apache/spark/pull/22358
Thanks, if there are the codecs found, we support those compressions, but how do I find it? @HyukjinKwon
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r215887803
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
--- End diff --
Installation may not be able to solve it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95785/testReport)** for PR 22358 at commit [`1db036a`](https://github.com/apache/spark/commit/1db036ad725bc7a3c60dbb9aede0f91cf0d798d0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216121746
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
--- End diff --
ah, ok.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95969 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95969/testReport)** for PR 22358 at commit [`64aef6b`](https://github.com/apache/spark/commit/64aef6ba6a0829bf490c6014521731b92630d716).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22358
I'm okay but I would close this if no committer agree with (approves) this for some long time.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on the issue:
https://github.com/apache/spark/pull/22358
yeah, the error message is output from external jar(parquet-common-1.10.0.jar),
I think spark + parquet should avoid the hadoop dependencies
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #96312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96312/testReport)** for PR 22358 at commit [`39eaf1d`](https://github.com/apache/spark/commit/39eaf1dca6e3885da0dcc1f59f3fb4633d7638fd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r215901781
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
--- End diff --
got it,thanks @wangyum
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96312/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3007/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22358
but if there are the codecs found, we support those compressions, no?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95852/testReport)** for PR 22358 at commit [`5c478b9`](https://github.com/apache/spark/commit/5c478b9b004d045cb843609c9a4066da616e1eac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95969 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95969/testReport)** for PR 22358 at commit [`64aef6b`](https://github.com/apache/spark/commit/64aef6ba6a0829bf490c6014521731b92630d716).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3275/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216185045
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+ none, uncompressed, snappy, gzip, lzo, brotli(need install brotliCodec), lz4, zstd(need install
+ ZStandardCodec before Hadoop 2.9.0).
--- End diff --
I would just add few lines for `brotli` and `zstd` below and leave the original text as is.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216119384
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
--- End diff --
hadoop-2.9.x is officially supported in Spark?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216180370
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -398,10 +398,10 @@ object SQLConf {
"`parquet.compression` is specified in the table-specific options/properties, the " +
"precedence would be `compression`, `parquet.compression`, " +
"`spark.sql.parquet.compression.codec`. Acceptable values include: none, uncompressed, " +
- "snappy, gzip, lzo, brotli, lz4, zstd.")
+ "snappy, gzip, lzo, lz4.")
.stringConf
.transform(_.toLowerCase(Locale.ROOT))
- .checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", "lz4", "brotli", "zstd"))
--- End diff --
I agree with you, removing is not a good idea.
Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22358
If the codecs are found, then we support it. One thing we should do might be to document to explicitly provide the codec but I am not sure how many users are confused about it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r218916987
--- Diff: docs/sql-programming-guide.md ---
@@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+ Note that `zstd` needs to install `ZStandardCodec` before Hadoop 2.9.0, `brotli` needs to install
+ `brotliCodec`.
--- End diff --
It is more clear to say "`zstd` requires ZStandardCodec to be installed".
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r219015195
--- Diff: docs/sql-programming-guide.md ---
@@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+ Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires
+ `brotliCodec` to be installed.
--- End diff --
`brotliCodec` -> `BrotliCodec`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22358
I am 0 on this since it is worth`Class org.apache.hadoop.io.compress.XXXCodec was not found` error message vs `need install ... ` message.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22358
just fyi about related talks: https://github.com/apache/spark/pull/21070#issuecomment-382086510
cc: @rdblue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95852/testReport)** for PR 22358 at commit [`5c478b9`](https://github.com/apache/spark/commit/5c478b9b004d045cb843609c9a4066da616e1eac).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216165218
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -398,10 +398,10 @@ object SQLConf {
"`parquet.compression` is specified in the table-specific options/properties, the " +
"precedence would be `compression`, `parquet.compression`, " +
"`spark.sql.parquet.compression.codec`. Acceptable values include: none, uncompressed, " +
- "snappy, gzip, lzo, brotli, lz4, zstd.")
+ "snappy, gzip, lzo, lz4.")
.stringConf
.transform(_.toLowerCase(Locale.ROOT))
- .checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", "lz4", "brotli", "zstd"))
--- End diff --
I thought if you remove it from here the user would not be able to use zstd or brotli even if it is installed/enabled/available?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95969/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3273/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22358
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216873177
--- Diff: docs/sql-programming-guide.md ---
@@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+ Note that `zstd` needs to install `ZStandardCodec` before Hadoop 2.9.0, `brotli` needs to install
+ `brotliCodec`.
--- End diff --
@HyukjinKwon How about adding a link? Users may not know where to download it.
```
`brotliCodec` -> [`brotli-codec`](https://github.com/rdblue/brotli-codec)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95930/testReport)** for PR 22358 at commit [`dd86d3f`](https://github.com/apache/spark/commit/dd86d3fcd8781a2727dd1351ec8239edb7041405).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95852/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95930/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r215897048
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
--- End diff --
`none, uncompressed, snappy, gzip, lzo, brotli(need install brotli-codec), lz4, zstd(since Hadoop 2.9.0)`
https://jira.apache.org/jira/browse/HADOOP-13578
https://github.com/rdblue/brotli-codec
https://jira.apache.org/jira/browse/HADOOP-13126
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #96314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96314/testReport)** for PR 22358 at commit [`0e5d0bc`](https://github.com/apache/spark/commit/0e5d0bc84c53356a28dce27b7acbcbab3ea7e106).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2961/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r215874603
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
--- End diff --
I prefer `none, uncompressed, snappy, gzip, lzo, brotli(need install ...), lz4, zstd(need install ...)`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95930/testReport)** for PR 22358 at commit [`dd86d3f`](https://github.com/apache/spark/commit/dd86d3fcd8781a2727dd1351ec8239edb7041405).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22358
That's probably something we should document, or improve the error message. Ideally, we should fix the error message from Parquet. Don't you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #96312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96312/testReport)** for PR 22358 at commit [`39eaf1d`](https://github.com/apache/spark/commit/39eaf1dca6e3885da0dcc1f59f3fb4633d7638fd).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96314/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95785/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #96314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96314/testReport)** for PR 22358 at commit [`0e5d0bc`](https://github.com/apache/spark/commit/0e5d0bc84c53356a28dce27b7acbcbab3ea7e106).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3029/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216120854
--- Diff: docs/sql-programming-guide.md ---
@@ -964,7 +964,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Sets the compression codec used when writing Parquet files. If either `compression` or
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
- none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
--- End diff --
I think so given the download page.
![screen shot 2018-09-08 at 12 41 41 pm](https://user-images.githubusercontent.com/6477701/45250388-94d6b280-b364-11e8-85ee-1a67daa3a123.png)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22358
**[Test build #95785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95785/testReport)** for PR 22358 at commit [`1db036a`](https://github.com/apache/spark/commit/1db036ad725bc7a3c60dbb9aede0f91cf0d798d0).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2919/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22358#discussion_r216657064
--- Diff: docs/sql-programming-guide.md ---
@@ -965,6 +965,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
`parquet.compression` is specified in the table-specific options/properties, the precedence would be
`compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+ Note that `zstd` needs install `ZStandardCodec` before Hadoop 2.9.0, `brotli` needs install `brotliCodec`.
--- End diff --
`needs install` -> `needs to install`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCodec are n...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22358
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22358
@srowen and @vanxin WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org