You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nasser Ebrahim <en...@linux.vnet.ibm.com> on 2016/09/12 14:00:40 UTC

GZIP compression support for Spark internal data

Hi,

Can we use GZIP compression for internal data such as RDD partitions, 
broadcast variables and shuffle outputs so that user will have more 
choice compared to the available LZ4, LZF and Snappy?  Is there any 
specific reason we are not supporting the JDK inbuilt compression? If 
not, shall I create a JIRA to get this implemented.

Thank you,
Nasser Ebrahim

Re: GZIP compression support for Spark internal data

Posted by Nasser Ebrahim <en...@linux.vnet.ibm.com>.

Thank you Takeshi for sharing the info. I agree with Patrick and you 
that there is no point in adding more codec unless it is showing better 
performance results (at least with some work loads on some platforms). 
The performance of GZIP depends upon its implementation on the 
platforms. Will do some performance tests to see how it is performing 
compared to the existing codec in spark.

On 9/12/16 9:19 PM, Takeshi Yamamuro wrote:
> Hi,
>
> Have you seen https://issues.apache.org/jira/browse/SPARK-4633 ?
>
> // maropu
>
> On Mon, Sep 12, 2016 at 11:00 PM, Nasser Ebrahim 
> <enasser@linux.vnet.ibm.com <ma...@linux.vnet.ibm.com>> wrote:
>
>     Hi,
>
>     Can we use GZIP compression for internal data such as RDD
>     partitions, broadcast variables and shuffle outputs so that user
>     will have more choice compared to the available LZ4, LZF and
>     Snappy?  Is there any specific reason we are not supporting the
>     JDK inbuilt compression? If not, shall I create a JIRA to get this
>     implemented.
>
>     Thank you,
>     Nasser Ebrahim
>
>
>
>
> -- 
> ---
> Takeshi Yamamuro

Re: GZIP compression support for Spark internal data

Posted by Takeshi Yamamuro <li...@gmail.com>.

Hi,

Have you seen https://issues.apache.org/jira/browse/SPARK-4633 ?

// maropu

On Mon, Sep 12, 2016 at 11:00 PM, Nasser Ebrahim <enasser@linux.vnet.ibm.com
> wrote:

> Hi,
>
> Can we use GZIP compression for internal data such as RDD partitions,
> broadcast variables and shuffle outputs so that user will have more choice
> compared to the available LZ4, LZF and Snappy?  Is there any specific
> reason we are not supporting the JDK inbuilt compression? If not, shall I
> create a JIRA to get this implemented.
>
> Thank you,
> Nasser Ebrahim
>
>


-- 
---
Takeshi Yamamuro