You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2014/11/27 03:00:13 UTC
[jira] [Created] (SPARK-4633) Support gzip in
spark.compression.io.codec
Takeshi Yamamuro created SPARK-4633:
---------------------------------------
Summary: Support gzip in spark.compression.io.codec
Key: SPARK-4633
URL: https://issues.apache.org/jira/browse/SPARK-4633
Project: Spark
Issue Type: Improvement
Components: Input/Output
Reporter: Takeshi Yamamuro
Priority: Trivial
gzip is widely used in other frameowrks such as hadoop mapreduce and tez, and also
I think that gizip is more stable than other codecs in terms of both performance
and space overheads.
I have one open question; current spark configuratios have a block size option
for each codec (spark.io.compression.[gzip|lz4|snappy].block.size).
As # of codecs increases, the configurations have more options and
I think that it is sort of complicated for non-expert users.
To mitigate it, my thought follows;
the three configurations are replaced with a single option for block size
(spark.io.compression.block.size). Then, 'Meaning' in configurations
will describe "This option makes an effect on gzip, lz4, and snappy.
Block size (in bytes) used in compression, in the case when these compression
codecs are used. Lowering...".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org