You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/08/31 08:42:00 UTC

[jira] [Commented] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

    [ https://issues.apache.org/jira/browse/SPARK-21786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598432#comment-16598432 ] 

Apache Spark commented on SPARK-21786:
--------------------------------------

User 'fjh100456' has created a pull request for this issue:
https://github.com/apache/spark/pull/22301

> The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21786
>                 URL: https://issues.apache.org/jira/browse/SPARK-21786
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jinhua Fu
>            Assignee: Jinhua Fu
>            Priority: Major
>             Fix For: 2.3.0
>
>
> Since Hive 1.1, Hive allows users to set parquet compression codec via table-level properties parquet.compression. See the JIRA: https://issues.apache.org/jira/browse/HIVE-7858 . We do support orc.compression for ORC. Thus, for external users, it is more straightforward to support both. See the stackflow question: https://stackoverflow.com/questions/36941122/spark-sql-ignores-parquet-compression-propertie-specified-in-tblproperties
> In Spark side, our table-level compression conf compression was added by #11464 since Spark 2.0.
> We need to support both table-level conf. Users might also use session-level conf spark.sql.parquet.compression.codec. The priority rule will be like
> If other compression codec configuration was found through hive or parquet, the precedence would be compression, parquet.compression, spark.sql.parquet.compression.codec. Acceptable values include: none, uncompressed, snappy, gzip, lzo.
> The rule for Parquet is consistent with the ORC after the change.
> Changes:
> 1.Increased acquiring 'compressionCodecClassName' from parquet.compression,and the precedence order is compression,parquet.compression,spark.sql.parquet.compression.codec, just like what we do in OrcOptions.
> 2.Change spark.sql.parquet.compression.codec to support "none".Actually in ParquetOptions,we do support "none" as equivalent to "uncompressed", but it does not allowed to configured to "none".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org