You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/03/18 05:16:33 UTC
[jira] [Created] (SPARK-13997) Use Hadoop 2.0 default value for
compression in data sources
Hyukjin Kwon created SPARK-13997:
------------------------------------
Summary: Use Hadoop 2.0 default value for compression in data sources
Key: SPARK-13997
URL: https://issues.apache.org/jira/browse/SPARK-13997
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.0
Reporter: Hyukjin Kwon
Priority: Trivial
Currently, JSON, TEXT and CSV data sources use {{CompressionCodecs}} class to set compression configurations via {{option("compress", "codec")}}.
I made this uses Hadoop 1.x default value (block level compression). However, the default value in Hadoop 2.x is record level compression as described in [mapred-site.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml].
Since it drops Hadoop 1.x, it will make sense to use Hadoop 2.x default values.
According to [Hadoop Definitive Guide 3th edition|https://www.safaribooksonline.com/library/view/hadoop-the-definitive/9781449328917/ch04.html], it looks configurations for the unit of compression (record or block).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org