You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Maung Than <ma...@apple.com> on 2014/05/07 16:16:51 UTC

Compression in Kafka: GZIP or Snappy

Hi  All, 

I have read this posting from linkedIn Team member; http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/ ; Thanks. 

I have few questions and thoughts: 

1) In asynchronous mode, you are compressing the batch, not the individual send. 

2) There is no way we can ask for the compressed message from the consumer end — because that is want we needed. But I am not sure though  that is going to work since we we may not get the messages in the exact same sizes that they are compressed. 

3) Since you are doing another round of uncompress/compress in the broker to help maintain the message offsets, will the larger individual messages help reduce the CUP load on the Broker. Because that would effectively reduces the number of offsets and therefore individual uncompress/compress activities. 

If so we can batch up even before calling the send to make the message sizes become larger. 

4) Has any one else done Snappy Vs. GZIP comparison and chosen one over the other. If so what are your findings and why if you could share as we are going through this expertise. 

Thanks for your thoughts, 
Maung

Re: Compression in Kafka: GZIP or Snappy

Posted by Joe Stein <jo...@stealth.ly>.
I created a ticket for the patch
https://issues.apache.org/jira/browse/KAFKA-1456

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Thu, May 15, 2014 at 1:59 PM, Steven Schlansker <
sschlansker@opentable.com> wrote:

>
> On May 7, 2014, at 7:16 AM, Maung Than <ma...@apple.com> wrote:
>
> > Hi  All,
> >
> > I have read this posting from linkedIn Team member;
> http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/; Thanks.
> >
> > I have few questions and thoughts:
> >
> > 4) Has any one else done Snappy Vs. GZIP comparison and chosen one over
> the other. If so what are your findings and why if you could share as we
> are going through this expertise.
>
> I compared GZIP vs Snappy vs LZ4 for HTTP data a little while ago.
>
> We found that GZIP was so slow that we could not saturate our network
> connection with GZIP enabled - it just takes too much CPU!  We decided that
> GZIP might make sense over WAN links but there’s no way it makes sense over
> a LAN.
>
> Snappy was much faster.
>
> LZ4 was faster than GZIP and smaller than Snappy.  We switched to LZ4 and
> never looked back.  It would be cool if Kafka supported it :)
>
>

Re: Compression in Kafka: GZIP or Snappy

Posted by Steven Schlansker <ss...@opentable.com>.
On May 7, 2014, at 7:16 AM, Maung Than <ma...@apple.com> wrote:

> Hi  All, 
> 
> I have read this posting from linkedIn Team member; http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/ ; Thanks. 
> 
> I have few questions and thoughts: 
> 
> 4) Has any one else done Snappy Vs. GZIP comparison and chosen one over the other. If so what are your findings and why if you could share as we are going through this expertise. 

I compared GZIP vs Snappy vs LZ4 for HTTP data a little while ago.

We found that GZIP was so slow that we could not saturate our network connection with GZIP enabled - it just takes too much CPU!  We decided that GZIP might make sense over WAN links but there’s no way it makes sense over a LAN.

Snappy was much faster.

LZ4 was faster than GZIP and smaller than Snappy.  We switched to LZ4 and never looked back.  It would be cool if Kafka supported it :)