You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2014/11/27 03:00:13 UTC

[jira] [Created] (SPARK-4633) Support gzip in spark.compression.io.codec

Takeshi Yamamuro created SPARK-4633:
---------------------------------------

             Summary: Support gzip in spark.compression.io.codec
                 Key: SPARK-4633
                 URL: https://issues.apache.org/jira/browse/SPARK-4633
             Project: Spark
          Issue Type: Improvement
          Components: Input/Output
            Reporter: Takeshi Yamamuro
            Priority: Trivial


gzip is widely used in other frameowrks such as hadoop mapreduce and tez, and also
I think that gizip is more stable than other codecs in terms of both performance
and space overheads.

I have one open question; current spark configuratios have a block size option
for each codec (spark.io.compression.[gzip|lz4|snappy].block.size).
As # of codecs increases, the configurations have more options and
I think that it is sort of complicated for non-expert users.

To mitigate it, my thought follows;
the three configurations are replaced with a single option for block size
(spark.io.compression.block.size). Then, 'Meaning' in configurations
will describe "This option makes an effect on gzip, lz4, and snappy. 
Block size (in bytes) used in compression, in the case when these compression
codecs are used. Lowering...".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org