You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/07/27 16:13:00 UTC

[jira] [Resolved] (SPARK-24881) New options - compression and compressionLevel

     [ https://issues.apache.org/jira/browse/SPARK-24881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24881.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

Issue resolved by pull request 21837
[https://github.com/apache/spark/pull/21837]

> New options - compression and compressionLevel
> ----------------------------------------------
>
>                 Key: SPARK-24881
>                 URL: https://issues.apache.org/jira/browse/SPARK-24881
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> Currently Avro datasource takes the compression codec name from SQL config (config key is hard coded in AvroFileFormat): https://github.com/apache/spark/blob/106880edcd67bc20e8610a16f8ce6aa250268eeb/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala#L121-L125 . The obvious cons of it is modification of the global config can impact of multiple writes.
> A purpose of the ticket is to add new Avro option - "compression" the same as we already have for other datasource like JSON, CSV and etc. If new option is not set by an user, we take settings from SQL config spark.sql.avro.compression.codec. If the former one is not set too, default compression codec will be snappy (this is current behavior in the master).
> Besides of the compression option, need to add another option - compressionLevel which should reflect another SQL config in Avro: https://github.com/apache/spark/blob/106880edcd67bc20e8610a16f8ce6aa250268eeb/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala#L122



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org